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POSITION DEPENDENT RECOGNITION OF 
GNN NUCLEOTIDE TRIPLETS BY ZINC FINGERS 

5 CROSS-REFERENCES TO RELATED APPLICATIONS 

The present application is a continuation-in-part of copending U.S. Patent 
Application Serial No. 09/535,008, filed March 23, 2000, which application claims the 
benefit of U.S. provisional applications 60/126,238, filed March 24, 1999, 60/126,239 
filed March 24, 1999, 60/146,595 filed July 30, 1999 and 60/146,615 filed July 30, 1999. 
10 The present application is also a continuation-in-part of copending U.S. Patent 

Application Serial No. 09/716,637, filed November 20, 2000. The disclosures of all of 
^ the aforementioned applications are hereby incorporated by reference in their entireties 

y3 for all purposes. 

£ 15 BACKGROUND 

yy 

P Zinc finger proteins (ZFPs) are proteins that can bind to DNA in a sequence- 

y. : specific manner. Zinc fingers were first identified in the transcription factor TFIIIA from 

JT: the oocytes of the African clawed toad, Xenopus laevis. An exemplary motif 

O characterizing one class of these protein (C 2 H 2 class) is -Cys-(X) 2 -4-Cys-(X)i2-His-(X) 3 _5- 

J 20 His (where X is any amino acid) (SEQ. ID. No:l). A single finger domain is about 30 

amino acids in length, and several structural studies have demonstrated that it contains an 
alpha helix containing the two invariant histidine residues and two invariant cysteine 
residues in a beta turn co-ordinated through zinc. To date, over 10,000 zinc finger 
sequences have been identified in several thousand known or putative transcription 
25 factors. Zinc finger domains are involved not only in DNA-recognition, but also in RNA 
binding and in protein-protein binding. Current estimates are that this class of molecules 
will constitute about 2% of all human genes. 

The x-ray crystal structure of Zif268, a three-finger domain from a murine 
transcription factor, has been solved in complex with a cognate DNA sequence and 
30 shows that each finger can be superimposed on the next by a periodic rotation. The 
structure suggests that each finger interacts independently with DNA over 3 base-pair 
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intervals, with side-chains at positions -1, 2 , 3 and 6 on each recognition helix making 
contacts with their respective DNA triplet subsites. The amino terminus of Zif268 is 
situated at the 3 ' end of the DNA strand with which it makes most contacts. Some zinc 
fingers can bind to a fourth base in a target segment. If the strand with which a zinc 
5 finger protein makes most contacts is designated the target strand, some zinc finger 

proteins bind to a three base triplet in the target strand and a fourth base on the nontarget 
strand. The fourth base is complementary to the base immediately 3' of the three base 
subsite. 

The structure of the Zif268-DNA complex also suggested that the DNA sequence 
10 specificity of a zinc finger protein might be altered by making amino acid substitutions 
at the four helix positions (-1, 2, 3 and 6) on each of the zinc finger recognition helices. 
Phage display experiments using zinc finger combinatorial libraries to test this 
fj observation were published in a series of papers in 1994 (Rebar et al., Science 263, 671- 

g 673 (1994); Jamieson et al., Biochemistry 33, 5689-5695 (1994); Choo et al, PNAS 91, 

15 1 1 163-1 1 167 (1994)). Combinatorial libraries were constructed with randomized side- 
p chains in either the first or middle finger of Zif268 and then used to select for an altered 

^_ Zif268 binding site in which the appropriate DNA sub-site was replaced by an altered 

H 5 DNA triplet. Further, correlation between the nature of introduced mutations and the 

p resulting alteration in binding specificity gave rise to a partial set of substitution rules for 

U 20 design of ZFPs with altered binding specificity. 

Greisman & Pabo, Science 275, 657-661 (1997) discuss an elaboration of the 
phage display method in which each finger of a Zif268 was successively randomized and 
selected for binding to a new triplet sequence. This paper reported selection of ZFPs for a 
nuclear hormone response element, a p53 target site and a TATA box sequence. 
25 A number of papers have reported attempts to produce ZFPs to modulate 

particular target sites. For example, Choo et al., Nature 372, 645 (1994), report an 
attempt to design a ZFP that would repress expression of a bcr-abl oncogene. The target 
segment to which the ZFPs would bind was a nine base sequence 5'GCA GAA GCC3' 
chosen to overlap the junction created by a specific oncogenic translocation fusing the 
30 genes encoding bcr and abl. The intention was that a ZFP specific to this target site 

would bind to the oncogene without binding to abl or bcr component genes. The authors 
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used phage display to screen a mini-library of variant ZFPs for binding to this target 
segment. A variant ZFP thus isolated was then reported to repress expression of a stably 
transfected bcr-able construct in a cell line. 

Pomerantz et al., Science 267, 93-96 (1995) reported an attempt to design a novel 
DNA binding protein by fusing two fingers from Zif268 with a homeodomain from Oct- 
1. The hybrid protein was then fused with a transcriptional activator for expression as a 
chimeric protein. The chimeric protein was reported to bind a target site representing a 
hybrid of the subsites of its two components. The authors then constructed a reporter 
vector containing a luciferase gene operably linked to a promoter and a hybrid site for the 
chimeric DNA binding protein in proximity to the promoter. The authors reported that 
their chimeric DNA binding protein could activate expression of the luciferase gene. 

Liu et al., PNAS 94, 5525-5530 (1997) report forming a composite zinc finger 
protein by using a peptide spacer to link two component zinc finger proteins each having 
three fingers. The composite protein was then further linked to transcriptional activation 
domain. It was reported that the resulting chimeric protein bound to a target site formed 
from the target segments bound by the two component zinc finger proteins. It was further 
reported that the chimeric zinc finger protein could activate transcription of a reporter 
gene when its target site was inserted into a reporter plasmid in proximity to a promoter 
operably linked to the reporter. 

Choo et al, WO 98/53058, WO98/53059, and WO 98/53060 (1998) discuss 
selection of zinc finger proteins to bind to a target site within the HIV Tat gene. Choo et 
al. also discuss selection of a zinc finger protein to bind to a target site encompassing a 
site of a common mutation in the oncogene ras. The target site within ras was thus 
constrained by the position of the mutation. 

Previously-disclosed methods for the design of sequence-specific zinc finger 
proteins have often been based on modularity of individual zinc fingers; i.e., the ability 
of a zinc finger to recognize the same target subsite regardless of the location of the 
finger in a multi-finger protein. Although, in many instances, a zinc finger retains the 
same sequence specificity regardless of its location within a multi-finger protein; in 
certain cases, the sequence specificity of a zinc finger depends on its position. For 
example, it is possible for a finger to recognize a particular triplet sequence when it is 
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present as finger 1 of a three-finger protein, but to recognize a different triplet sequence 
when present as finger 2 of a three-finger protein. 

Attempts to address situations in which a zinc fmger behaves in a non-modular 
fashion (i.e., its sequence specificity depends upon its location in a multi-finger protein) 
have, to date, involved strategies employing randomization of key binding residues in 
multiple adjacent zinc fingers, followed by selection. See, for example, Isalan et ah 
(2001) Nature Biotechnol 19:656-660. However, methods for rational design of 
polypeptides containing non-modular zinc fingers have not heretofore been described. 



The present disclosure provides compositions comprising and methods involving 
position dependent recognition of GNN nucleotide triplets by zinc fingers. 

Thus, provided herein is a zinc finger protein tha/f binds to a target site, said zinc 
finger protein comprising a first (Fl), a second (F2), ahd a third (F3) zinc finger, ordered 
Fl, F2, F3 from N-terminus to C-terminus, said target site comprising, in 3' to 5 f 
direction, a first (SI), a second (S2), and a third<S3) target subsite, each target subsite 
having the nucleotide sequence GNN, wherem if SI comprises GAA, Fl comprises the 
amino acid sequence QRSNLVR; if S2 comprises GAA, F2 comprises the amino acid 
sequence QSGNLAR; if S3 comprises^GAA, F3 comprises the amino acid sequence 
QSGNLAR; if SI comprises GAGJa comprises the amino acid sequence RSDNLAR; if 

52 comprises GAG, F2 comprises the amino acid sequence RSDNLAR; if S3 comprises 
GAG, F3 comprises the amin/acid sequence RSDNLTR; if SI comprises GAC, Fl 
comprises the amino acid sequence DRSNLTR; if S2 comprises GAC, F2 comprises the 
amino acid sequence DRSNLTR; if S3 comprises GAC, F3 comprises the amino acid 
sequence DRSNLTO^ if SI comprises GAT, Fl comprises the amino acid sequence 
QSSNLAR; if S/ comprises GAT, F2 comprises the amino acid sequence TSGNLVR; if 

53 comprisesXiAT, F3 comprises the amino acid sequence TSANLSR; if SI comprises 
GGA, Fl comprises the amino acid sequence QSGHLAR; if S2 comprises GGA, F2 
compmfes the amino acid sequence QSGHLQR; if S3 comprises GGA, F3 comprises the 
amitfo acid sequence QSGHLQR; if SI comprises GGG, Fl comprises the amino acid 
sequence RSDHLAR; if S2 comprises GGG, F2 comprises the amino acid sequence 
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RSDHLSR; if S3 comprises GGG, F3 comprises the amino acid sequence RSD^ELSR; if 
SI comprises GGC, Fl comprises the amino acid sequence DRSHLRT; if S2 comprises 
GGC, F2 comprises the amino acid sequence DRSHLAR; if SI compri^s GGT, Fl 
comprises the amino acid sequence QSSHLTR; if S2 comprises GGT, F2 comprises the 
amino acid sequence TSGHLSR; if S3 comprises GGT, F3 comprises the amino acid 
sequence TSGHLVR; if SI comprises GCA, Fl comprises/fne amino acid sequence 
QSGSLTR; if S2 comprises GCA, F2 comprises QSG0LTR; if S3 comprises GCA, F3 
comprises QSGDLTR; if SI comprises GCG, Fl comprises the amino acid sequence 
RSDDLTR; if S2 comprises GCG, F2 compri^s the amino acid sequence RSDDLQR; if 
10 S3 comprises GCG, F3 comprises the amirio acid sequence RSDDLTR; if SI comprises 
GCC, Fl comprises the amino acid sequence ERGTLAR; if S2 comprises GCC, F2 

□ comprises the amino acid sequenc^T)RSDLTR; if S3 comprises GCC, F3 comprises the 

CP / 

•q amino acid sequence DRSDLTR; if SI comprises GCT, Fl comprises the amino acid 

^ sequence QSSDLTR; if S2/fomprises GCT, F2 comprises the amino acid sequence 

H| 15 QSSDLTR; if S3 comprises GCT, F3 comprises the amino acid sequence QSSDLQR; if 
m SI comprises GTA, y( comprises the amino acid sequence QSGALTR; if S2 comprises 

f_ s GTA, F2 compri^stiie amino acid sequence QSGALAR; if SI comprises GTG, Fl 

H ; comprises the^amino acid sequence RSDALTR; if S2 comprises GTG, F2 comprises the 

amino acid/sequence RSDALSR; if S3 comprises GTG, F3 comprises the amino acid 
j- 1 20 sequence RSDALTR; if SI comprises GTC, Fl comprises the amino acid sequence 

DRSALAR; if S2 comprises GTC, F2 comprises the amino acid sequence DRSALAR; 



d if S3 comprises GTC, F3 comprises the amino acid sequence DRSALAR. 

Also provided are methods of designing a zinc fipg^r protein comprising a first 
(Fl), a second (F2), and a third (F3) zinc fingenprd^ed Fl, F2, F3 from N-terminus to 
25 C-terminus that binds to a target site comprising, in 3 1 to 5 ! direction, a first (SI), a 
second (S2), and a third (S3) targ^ubsite, each target subsite having the nucleotide 
sequence GNN, the methojl^omprising the steps of (a) selecting the Fl zinc finger such 
that it binds to the^H^rget subsite, wherein if SI comprises GAA, Fl comprises the 
amino acid science QRSNLVR; if SI comprises GAG, Fl comprises the amino acid 
30 sequepe€TlSDNLAR; if SI comprises GAC, Fl comprises the amino acid sequence 

SNLTR; if SI comprises GAT, Fl comprises the amino acid sequence QSSNLAR; if 
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51 comprises GGA, Fl comprises the amino acid sequence QSGHLAR; if SI comprises 
GGG, Fl comprises the amino acid sequence RSDHLAR; if SI comprises GrGC, Fl 
comprises the amino acid sequence DRSHLRT; if SI comprises GGT, Ffcomprises the 
amino acid sequence QSSHLTR; if SI comprises GCA, Fl comprises/QSGSLTR; if SI 
comprises GCG, Fl comprises RSDDLTR; if S2 comprises GCG, E2 comprises 
RSDDLQR; if SI comprises GCC, Fl comprises ERGTLAR; if SI comprises GCT, Fl 
comprises the amino acid sequence QSSDLTR; if SI comprises GT A, Fl comprises the 
amino acid sequence QSGALTR; if SI comprises GTG, FLcomprises the amino acid 
sequence RSDALTR; if SI comprises GTC, Fl comprises the amino acid sequence 
DRSALAR; (b) selecting the F2 zinc finger such that i/binds to the S2 target subsite, 
wherein S2 comprises GAA, F2 comprises the amino acid sequence QSGNLAR; if S2 
comprises GAG, F2 comprises the amino acid sequence RSDNLAR; if S2 comprises 
GAC, F2 comprises the amino acid sequence DRSNLTR; if S2 comprises GAT, F2 
comprises the amino acid sequence TSGNWR; if S2 comprises GGA, F2 comprises the 
amino acid sequence QSGHLQR; if S2 comprises GGG, F2 comprises the amino acid 
sequence RSDHLSR; if S2 comprises/GGC, F2 comprises the amino acid sequence 
DRSHLAR; if S2 comprises GGTyF2 comprises the amino acid sequence TSGHLSR; if 

52 comprises GCA, F2 comprise/the amino acid sequence QSGDLTR; if S2 comprises 
GCC, F2 comprises the amino^cid sequence DRSDLTR; if S2 comprises GCT, F2 
comprises the amino acid sequence QSSDLTR; if S2 comprises GTA, F2 comprises the 
amino acid sequence QSGALAR; if S2 comprises GTG, F2 comprises the amino acid 
sequence RSDALSR; if S2 comprises GTC, F2 comprises the amino acid sequence 
DRSALAR; and (c)/electing the F3 zinc finger such that it binds to the S3 target subsite, 
wherein if S3 comprises GAA, F3 comprises the amino acid sequence QSGNLAR; if S3 
comprises GAG, F3 comprises the amino acid sequence RSDNLTR; if S3 comprises 
GAC, F3 comprises the amino acid sequence DRSNLTR; if S3 comprises GAT, F3 
comprises/the amino acid sequence TSANLSR; if S3 comprises GGA, F3 comprises the 
amino atid sequence QSGHLQR; if S3 comprises GGG, F3 comprises RSDHLSR; if S3 
composes GGT, F3 comprises the amino acid sequence TSGHLVR; if S3 comprises 
GCA, F3 comprises the amino acid sequence QSGDLTR; if S3 comprises GCG, F3 
comprises the amino acid sequence RSDDLTR; if S3 comprises GCC, F3 comprises the 
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amino acid sequence DRSDLTR; if S3 comprises GCT, F3 comprises the amino acid 
sequenceQ^SDLQR; if S3 comprises GTG, F3 comprises RSDALTR; and if S3 
comprises GTC, F3 comprises the amino acid sequence DRSALAR; 

thereby designing a zinc finger protein that binds to a target site. 
^*^5 ^ In certain embodiments of the zinc finger proteins and methods describpd'herein, 

^ ^ SI comprises GAA and Fl comprises the amino acid sequence QRSNLVR/fn other 

/ 

embodiments, S2 comprises GAA and F2 comprises the amino acid sequence 
QSGNLAR. In other embodiments, S3 comprises GAA and F3 comprises the amino acid 
sequence QSGNLAR. In other embodiments, SI comprises GAG and Fl comprises the 
10 amino acid sequence RSDNLAR. In other embodiments, S2 comprises GAG and F2 
comprises the amino acid sequence RSDNLAR. In other/embodiments, S3 comprises 
m GAG and F3 comprises the amino acid sequence RSI)NLTR. In other embodiments, SI 

"5 comprises GAC and Fl comprises the amino acid sequence DRSNLTR. In other 

% / 

^ embodiments, S2 comprises GAC and F2 comprises the amino acid sequence 

gi 15 DRSNLTR. In other embodiments, S3 comprises GAC and F3 comprises the amino acid 

DP* / 

; sequence DRSNLTR. In other embodiments, SI comprises GAT and Fl comprises the 

H- amino acid sequence QSSNLAR. Inother embodiments, S2 comprises GAT and F2 

fy comprises the amino acid sequenceTSGNLVR. In other embodiments, S3 comprises 

GAT and F3 comprises the armno acid sequence TSANLSR. In other embodiments, SI 
M 20 comprises GGA and Fl comprises the amino acid sequence QSGHLAR. In other 
embodiments, S2 comprises GGA and F2 comprises the amino acid sequence 
QSGHLQR. In other^mbodiments, S3 comprises GGA and F3 comprises the amino acid 
sequence QSGHLQR. In other embodiments, SI comprises GGG and Fl comprises the 
amino acid sequence RSDHLAR. In other embodiments, S2 comprises GGG and F2 
25 comprises Wamino acid sequence RSDHLSR. In other embodiments, S3 comprises 
GGG and ^3 comprises the amino acid sequence RSDHLSR. In other embodiments, SI 
comprises GGC and Fl comprises the amino acid sequence DRSHLTR. In other 
embodiments, S2 comprises GGC and F2 comprises the amino acid sequence 
DJcSHLAR. In other embodiments, SI comprises GGT and Fl comprises the amino acid 
30 Sequence QSSHLTR. In other embodiments, S2 comprises GGT and F2 comprises the 
amino acid sequence TSGHLSR. In other embodiments, S3 comprises GGT and F3 
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comprises the amino acid sequence TSGHLVR. In other embodiments, SI comprises 
GCA and Fl comprises the amino acid sequence QSGSLTR. In other embodiments, S2 
comprises GCA and F2 comprises the amino acid sequence QSGDLTR. In other 
embodiments, S3 comprises GCA and F3 comprises the amino acid sequence 
QSGDLTR. In other embodiments, SI comprises GCG and Fl comprises the amino acid 
sequence RSDDLTR. In other embodiments, S2 comprises GCG and F2 comprises the 
amino acid sequence RSDDLQR. In other embodiments, S3 comprises GCG and F3 
comprises the amino acid sequence RSDDLTR. In other embodiments, SI comprises 
GCC and Fl comprises the amino acid sequence ERGTLAR. In other embodiments, S2 
comprises GCC and F2 comprises the amino acid sequence DRSDLTR. In other 
embodiments, S3 comprises GCC and F3 comprises the amino acid sequence DRSDLTR. 
In other embodiments, SI comprises GCT and Fl comprises the amino acid sequence 
QSSDLTR. In other embodiments, S2 comprises GCT and F2 comprises the amino acid 
sequence QSSDLTR. In other embodiments, S3 comprises GCT and F3 comprises the 
amino acid sequence QSSDLQR. In other embodiments, SI comprises GTA and Fl 
comprises the amino acid sequence QSGALTR. In other embodiments, S2 comprises 
GTA and F2 comprises the amino acid sequence QSGALAR. In other embodiments, SI 
comprises GTG and Fl comprises the amino acid sequence RSDALTR. In other 
embodiments, S2 comprises GTG and F2 comprises the amino acid sequence RSDALSR. 
In other embodiments, S3 comprises GTG and F3 comprises the amino acid sequence 
RSDALTR. In other embodiments, SI comprises GTC and Fl comprises the amino acid 
sequence DRSALAR. In other embodiments, S2 comprises GTC and F2 comprises the 
amino acid sequence DRSALAR. In other embodiments, S3 comprises GTC and F3 
comprises the amino acid sequence DRSALAR. 

Also provided are polypeptides comprising any of zinc finger proteins described 
herein. In certain embodiments, the polypeptide further comprises at least one functional 
domain. Also provided are polynucleotides encoding any of the polypeptides described 
herein. Thus, also provided are nucleic acid encoding zinc fingers, including all of the 
zinc fingers described above. 
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Also provided are segments of a zinc finger comprising a sequence of seven 
contiguous amino acids as shown herein. Also provided are nucleic acids encoding any 
of these segments and zinc fingers comprising the same. 

Also provided are zinc finger proteins comprising first, second and third zinc 
fingers. The first, second and third zinc fingers comprise respectively first, second and 
third segments of seven contiguous amino acids as shown herein. Also provided are 
nucleic acids encoding such zinc finger proteins. 



Figure 1 shows results of site selection analysis of two representative zinc finger 
proteins (leftmost 4 columns) and measurements of binding affinity^tfr each of these 
proteins to their intended target sequences and to variant targej/^equences. (rightmost 3 
columns). Analysis of ZFP1 is shown in the upper portioja / of the figure and analysis of 
ZFP2 is shown in the lower portion of the figure. Fpruie site selection analyses, the 
amino acid sequences of residues -1 through H^of the recognition helix of each of the 
three component zinc fingers (F3, F2 amir 1) are shown across the top row; the intended 
target sequence (divided into fmg^specific target subsites) is shown across the second 
row, and a summary of the sequences bound is shown in the third row. Data for F3 is 
shown in the second column, data for F2 is shown in the third column, and data for Fl is 
shown in the third^olumn. 

For the binding affinity analyses, the designed target sequence for each ZFP 
("cogn^re') and two related sequences ("Mt") are shown (column 6), along with the K<j 
{poinding of the ZFP to each of these sequences (column 7). 

Figure 2 shows amino acid sequences of zinc finger recognition regions (amino 
acids -1 through +6 of the rec^gmtion helix) that bind to each of the 16 GNN triplet 
subsites. Three amino aptdi sequences are shown for each trinucleotide subsite; these 
correspond to op tipa^l amino acid sequences for recognition of the subsite from each of 
the three positions (finger 1, Fl ; finger 2, F2; or finger 3, F3) in a three-finger zinc finger 
proteinXAmino acid sequences are from N-terminal to C-terminal; nucleotide sequences 
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Also shown are site selectiorpresults for each of the 48 position-dependent GNN- 
recognizing zinc fingers. These^how the number of times a particular nucleotide was 
present, at a given position^ a collection of oligonucleotide sequences bound by the 
finger. For example, ojtrt of 15 oligonucleotides bound by a zinc finger protein with the 
amino acid sequence QSGHLAR present at the finger 1 (Fl) position, 15 contained a G 
in the S'-mos^osition of the subsite, 15 contained a G in the middle position of the 
subsite, wKile, at the 3 '-most position of the subsite, 10 contained an A, 3 contained a G 
and 2/Jontained a T. Accordingly, this particular amino acid sequence is optimal for 
t>j*faing a GGA triplet from the Fl position. 

Figures 3 A, 3B and 3C show site selectioja / ciata indicating positional dependence 
of GCA-, GAT- and GGT-binding zinc fingers/ The first and fourth (where applicable) 
rows of each figure show portions of the anfmo acid sequence of a designed zinc finger 
protein. Amino acid residues-1 throng +6 of each a-helix are listed from left to right. 
The second and fifth (where applicable) rows show the target sequence, divided into three 
triplet subsites, one for eapKiinger of the protein shown in the first and fourth (where 
applicable) rows, respectively. The third and sixth (where applicable) rows show the 
distribution of njKUeotides in the oligonucleotides obtained by site selection with the 
proteins shp^n in the first and fourth (where applicable) rows, respectively. Figure 3A 
showdata for fingers designed to bind GCA; Figure 3B shows data for fingers designed 
tjytind GAT; Figure 3C shows data for fingers designed to bind GGT. 

Figures 4A and 4B show properties of the engineered ZFP EP2C. Figure 4A 
shows site selection data. ThejSr^trow provides the amino acid sequences of residues -1 
through +6 of the recogpifion helices for each of the three zinc fingers of the EP2C 
protein. The secpm row shows the target sequence (5' to 3'); with the distribution of 
nucleotid^in the oligonucleotides obtained by site selection indicated below the target 
seqjiefice. 

Figure 4B shows in vitro and in vivo assays for the binding specificity of EP2C. 
The first three columns show in vitro measurements of binding affinity of EP2C to its 
intended target sequence and several related sequences. The first column gives the name 
of each sequence (2C0 is the intended target sequence, compare to Figure 4A). The 
second column shows the nucleotide sequence of various target sequences, with 
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differences from the intended target sequence (2C0) highlighted. The third column 
shows the Ka (in nM) for binding of EP2C to each of the target sequences. KdS were 
determined by gel shift assays, using 2-fold dilution series of EP2C. The right side of the 
figure (fourth column and bar graph) shows relative luciferase activities (normalized to 
p-galactosidase levels) in stable cell lines in which expression of EP2C is inducible. 
Cells were co-transfected with a vector containing a luciferase coding region under the 
transcriptional control of the target sequence shown in the same row of the figure, and a 
control vector encoding p-galactosidase. Luciferase and p-galactosidase levels were 
measured after induction of EP2C expression. Triplicate samples were assayed and the 
standard deviations are shown in the bar graph. pGL3 is a luciferase-encoding vector 
lacking EP2C target sequences. 3B is another negative control, in which luciferase 
expression is under transcriptional control of sequences (3B) unrelated to the EP2C target 
sequence. 



A zinc finger DNA binding protein is a protein or segment within a larger protein 
that binds DNA in a sequence-specific manner as a result of stabilization of protein 
structure through coordination of a zinc ion. The term zinc finger DNA binding protein 
is often abbreviated as zinc finger protein or ZFP. 

Zinc finger proteins can be engineered to recognize a selected target sequence in a 
nucleic acid. Any method known in the art or disclosed herein can be used to construct 
an engineered zinc finger protein or a nucleic acid encoding an engineered zinc finger 
protein. These include, but are not limited to, rational design, selection methods (e.g., 
phage display) random mutagenesis, combinatorial libraries, computer design, affinity 
selection, use of databases matching zinc finger amino acid sequences with target subsite 
nucleotide sequences, cloning from cDNA and/or genomic libraries, and synthetic 
constructions. An engineered zinc finger protein can comprise a new combination of 
naturally-occurring zinc finger sequences. Methods for engineering zinc finger proteins 
are disclosed in co-owned WO 00/41566 and WO 00/42219; as well as in WO 98/53057; 
WO 98/53058; WO 98/53059 and WO 98/53060; the disclosures of which are hereby 
incorporated by reference in their entireties. Methods for identifying preferred target 
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sequences, and for engineering zinc finger proteins to bind to such preferred target 
sequences, are disclosed in co-owned WO 00/42219. 

A designed zinc finger protein is a protein not occurring in nature whose 
design/composition results principally from rational criteria. Rational criteria for design 
include application of substitution rules and computerized algorithms for processing 
information in a database storing information of existing ZFP designs and binding data. 

A selected zinc finger protein is a protein not found in nature whose production 
results primarily from an empirical process such as phage display. 

The term naturally-occurring is used to describe an object that can be found in 
nature as distinct from being artificially produced by man. For example, a polypeptide or 
polynucleotide sequence that is present in an organism (including viruses) that can be 
isolated from a source in nature and which has not been intentionally modified by man in 
the laboratory is naturally-occurring. Generally, the term naturally-occurring refers to an 
object as present in a non-pathological (undiseased) individual, such as would be typical 
for the species. 

A nucleic acid is operably linked when it is placed into a functional relationship 
with another nucleic acid sequence. For instance, a promoter or enhancer is operably 
linked to a coding sequence if it increases the transcription of the coding sequence. 
Operably linked means that the DNA sequences being linked are typically contiguous 
and, where necessary to join two protein coding regions, contiguous and in reading 
frame. However, since enhancers generally function when separated from the promoter 
by up to several kilobases or more and intronic sequences may be of variable lengths, 
some polynucleotide elements may be operably linked but not contiguous. 

A specific binding affinity between, for example, a ZFP and a specific target site 
means a binding affinity of at least 1 x 10 6 M" 1 . 

The terms "modulating expression" "inhibiting expression" and "activating 
expression" of a gene refer to the ability of a zinc finger protein to activate or inhibit 
transcription of a gene. Activation includes prevention of subsequent transcriptional 
inhibition (i.e., prevention of repression of gene expression) and inhibition includes 
prevention of subsequent transcriptional activation (i.e., prevention of gene activation). 
Modulation can be assayed by determining any parameter that is indirectly or directly 
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affected by the expression of the target gene. Such parameters include, e.g., changes in 
RNA or protein levels, changes in protein activity, changes in product levels, changes in 
downstream gene expression, changes in reporter gene transcription (luciferase, CAT, 
beta-galactosidase, GFP (see, e.g., Mistili & Spector, Nature Biotechnology 15:961-964 
(1997)); changes in signal transduction, phosphorylation and dephosphorylation, 
receptor-ligand interactions, second messenger concentrations (e.g., cGMP, cAMP, IP3, 
and Ca2+), cell growth, neovascularization, in vitro, in vivo, and ex vivo. Such functional 
effects can be measured by any means known to those skilled in the art, e.g., 
measurement of RNA or protein levels, measurement of RNA stability, identification of 
downstream or reporter gene expression, e.g., via chemiluminescence, fluorescence, 
colorimetric reactions, antibody binding, inducible markers, ligand binding assays; 
changes in intracellular second messengers such as cGMP and inositol triphosphate (IP3); 
changes in intracellular calcium levels; cytokine release, and the like. 

A "regulatory domain" refers to a protein or a protein subsequence that has 
transcriptional modulation activity. Typically, a regulatory domain is covalently or non- 
covalently linked to a ZFP to modulate transcription. Alternatively, a ZFP can act alone, 
without a regulatory domain, or with multiple regulatory domains to modulate 
transcription. 

A D-able subsite within a target'gite has the motif 5'NNGK3\ A target site 
containing one or more suclyjK$fifs is sometimes described as a D-able target site. A 
zinc finger appropriatglj^designed to bind to a D-able subsite is sometimes referred to as 
a D-able fing^Ukewise a zinc finger protein containing at least one finger designed or 
selecte$pt6bind to a target site including at least one D-able subsite is sometimes referred 
tP"£s a D-able zinc finger protein. 



DETAILED DESCRIPTION 

I. General 

Tables 1-5 list a collection of nonnaturally occurring zinc finger protein 
sequences and their corresponding target sites. The first column of each table is an 
internal reference number. The second column lists a 9 or 10 base target site bound by a 
three-finger zinc finger protein, with the target sites listed in 5' to 3 5 orientation. The 
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third column provides SEQ ID NOs for the target site sequences listed in column 2. The 
fourth, sixth and eighth columns list amino acid residues from the first, second and third 
fingers, respectively, of a zinc finger protein which recognizes the target sequence listed 
in the second column. For each finger, seven amino acids, occupying positions -1 to +6 
5 of the finger, are listed. The numbering convention for zinc fingers is defined below. 
Columns 5, 7 and 9 provide SEQ ID NOs for the amino acid sequences listed in columns 
4, 6 and 8, respectively. The final column of each table lists the binding affinity (i.e., the 
K<j in nM) of the zinc finger protein for its target site. Binding affinities are measured as 
described below. 

10 Each finger binds to a triplet of bases within a corresponding target sequence. 

The first finger binds to the first triplet starting from the 3' end of a target site, the second 
finger binds to the second triplet, and the third finger binds the third (i.e., the 5 '-most) 
triplet of the target sequence. For example, the RSDSLTS finger (SEQ ID NO: 646) of 
SBS# 201 (Table 2) binds to 5'TTG3\ the ERSTLTR finger (SEQ ID NO: 851) binds 
15 to5'GCC3' and the QRADLRR finger (SEQ ID NO: 1056) binds to 5'GCA3\ 
&JU \ Table 6 lists a collection of consensus sequences famine fingers and the target 
sites bound by such sequences. Conventional one lettej/amino acid codes are used to 
designate amino acids occupying consensus position. The symbol "X" designates a 
nonconsensus position that can in principle be^fecupied by any amino acid. In most zinc 
20 fingers of the C2H2 type, binding specifichy is principally conferred by residues -1, +2, 
+3 and +6. Accordingly, consensus sequence determining binding specificity typically 
include at least these residues. Cojafsensus sequences are useful for designing zinc fingers 
to bind to a given target sequg^e. Residues occupying other positions can be selected 
based on sequences in Tables 1-5, or other known zinc finger sequences. Alternatively, 
25 these positions can be^ndomized with a plurality of candidate amino acids and screened 
against one or mrae target sequences to refine binding specificity or improve binding 
specificity. In/general, the same consensus sequence can be used for design of a zinc 
finger regardless of the relative position of that finger in a multi-finger zinc finger 
proteins/For example, the sequence RXDNXXR can be used to design a N-terminal, 
30 central or C-terminal finger of three finger protein. However, some consensus sequences 
most suitable for designing a zinc finger to occupy a particular position in a multi- 
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finger proteiiL/For example, the consensus sequence RXDHXXQ is most suitable for 



II, Characteristics of Zinc Finger Proteins 

Zinc finger proteins are formed from zinc finger components. For example, zinc 
finger proteins can have one to thirty-seven fingers, commonly having 2, 3, 4, 5 or 6 
fingers. A zinc finger protein recognizes and binds to a target site (sometimes referred to 
as a target segment) that represents a relatively small subsequence within a target gene. 
Each component finger of a zinc finger protein can bind to a subsite within the target site. 
The subsite includes a triplet of three contiguous bases all on the same strand (sometimes 
referred to as the target strand). The subsite may or may not also include a fourth base on 
the opposite strand that is the complement of the base immediately 3' of the three 
contiguous bases on the target strand. In many zinc finger proteins, a zinc finger binds to 
its triplet subsite substantially independently of other fingers in the same zinc finger 
protein. Accordingly, the binding specificity of zinc finger protein containing multiple 
fingers is usually approximately the aggregate of the specificities of its component 
fingers. For example, if a zinc finger protein is formed from first, second and third 
fingers that individually bind to triplets XXX, YYY, and ZZZ, the binding specificity of 
the zinc finger protein is 3 'XXX YYY ZZZ5\ 

The relative order of fingers in a zinc finger protein from N-terminal to C- 
terminal determines the relative order of triplets in the 3' to 5' direction in the target. 
For example, if a zinc finger protein comprises from N-terminal to C-terminal first, 
second and third fingers that individualy bind, respectively, to triplets 5' GAC3\ 
5'GTA3' and 5"GGC3' then the zinc finger protein binds to the target segment 
3'CAGATGCGG5\ If the zinc finger protein comprises the fingers in another order, for 
example, second finger, first finger, third finger, then the zinc finger protein binds to a 
target segment comprising a different permutation of triplets, in this example, 
3 ' ATGC AGCGG5 ' (see Berg & Shi, Science 271, 1081-1086 (1996)). The assessment 
of binding properties of a zinc finger protein as the aggregate of its component fingers 
may, in some cases, be influenced by context-dependent interactions of multiple fingers 
binding in the same protein. 



^ designii 




terminal finger of a three-finger protein. 
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Two or more zinc finger proteins can be linked to have a target specificity that is 
the aggregate of that of the component zinc finger proteins (see e.g., Kim & Pabo, PNAS 
95, 2812-2817 (1998)). For example, a first zinc finger protein having first, second and 
third component fingers that respectively bind to XXX, YYY and ZZZ can be linked to a 
second zinc finger protein having first, second and third component fingers with binding 
specificities, AAA, BBB and CCC. The binding specificity of the combined first and 
second proteins is thus 3 'XXXYYYZZZ AAABBBCCC5 \ where the underline 
indicates a short intervening region (typically 0-5 bases of any type). In this situation, the 
target site can be viewed as comprising two target segments separated by an intervening 
segment. 

Linkage can be accomplished using any of the following peptide linkers. 
T G E K P: (SEQ. ID. No:2) (Liu et al., 1997, supra.); (G4S)n (SEQ. ID. No:3) (Kim et 
al., PNAS 93, 1156-1160 (1996.); GGRRGGGS; (SEQ. ID. No:4) LRQRDGERP; (SEQ. 
ID. No:5) LRQKDGGGSERP; (SEQ. ID. No:6) LRQKD(G3S)2 ERP (SEQ. ID. No:7) 
Alternatively, flexible linkers can be rationally designed using computer programs 
capable of modeling both DNA-binding sites and the peptides themselves or by phage 
display methods . In a further variation, noncovalent linkage can be achieved by fusing 
two zinc finger proteins with domains promoting heterodimer formation of the two zinc 
finger proteins. For example, one zinc finger protein can be fused with fos and the other 
with jun (see Barbas et al., WO 95/119431). 

Linkage of two zinc finger proteins is advantageous for conferring a unique 
binding specificity within a mammalian genome. A typical mammalian diploid genome 
consists of 3 x 10 9 bp. Assuming that the four nucleotides A, C, G, and T are randomly 
distributed, a given 9 bp sequence is present -23,000 times. Thus a ZFP recognizing a 9 
bp target with absolute specificity would have the potential to bind to -23,000 sites 
within the genome. An 18 bp sequence is present once in 3.4 x 10 10 bp, or about once in 
a random DNA sequence whose complexity is ten times that of a mammalian genome. 

A component finger of zinc finger protein typically contains about 30 amino acids 
and has the following motif (N-C) : 



(SEQ. ID. No:8) 

Cys- (X) 2-4-Cys-X.X.X.X.X.X.X.X.X.X.X.X-His- (X) 3 _ 5 -His 
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-11234567 
The two invariant histidine residues and two invariant cysteine residues in a single 
beta turn are co-ordinated through zinc (see, e.g., Berg & Shi, Science 271, 1081-1085 
(1996)). The above motif shows a numbering convention that is standard in the field for 
5 the region of a zinc finger conferring binding specificity. The amino acid on the left (N- 
terminal side) of the first invariant His residues is assigned the number +6, and other 
amino acids further to the left are assigned successively decreasing numbers. The alpha 
helix begins at residue 1 and extends to the residue following the second conserved 
histidine. The entire helix is therefore of variable length, between 1 1 and 13 residues. 
10 The process of designing or selecting a nonnaturally occurring or variant ZFP 

typically starts with a natural ZFP as a source of framework residues. The process of 
design or selection serves to define nonconserved positions (i.e., positions -1 to +6) so as 
to confer a desired binding specificity. One suitable ZFP is the DNA binding domain of 
the mouse transcription factor Zif268. The DNA binding domain of this protein has the 



m 15 amino acid sequence: 

y 1 YACPVESCDRRFSRSDELTRHIRIHTGQKP (Fl) (SEQ. ID No:9) 

H- FQCRICMRNFSRSDHLTTHIRTHTGEKP (F2) (SEQ. ID. No:10) 

Jy FACDICGRKFARSDERKRHTKIHLRQK (F3) SEQ. ID. No: 1 1 ) 

g and binds to a target 5' GCG TGG GCG 3' (SEQ ID No:12). 

h b 20 Another suitable natural zinc finger protein as a source of framework residues is 

Sp-1. The Sp-1 sequence used for construction of zinc finger proteins corresponds to 
amino acids 531 to 624 in the Sp-1 transcription factor. This sequence is 94 amino acids 
in length. The amino acid sequence of Sp-1 is as follows: 
PGKKKQHICHIQGCGKVYGKTSHLRAHLRWHTGERP 

25 FMCTWSYCGKRFTRSDELQRHKRTHTGEKK 

FACPECPKRFMRSDHLSKHIKTHQNKKG (SEQ. ID. No:13) 
Sp-1 binds to a target site 5'GGG GCG GGG3' (SEQ ID No: 14). 

An alternate form of Sp-1, an Sp-1 consensus sequence, has the following amino 
acid sequence: 

30 meklrngsgd 

PGKKKQHACPECGKSFSKS SHLRAHQRTHTGERP 
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YKCPECGKSFSRSDELQRHQRTHTGEKP 

YKCPECGKSFSRSDHLSKHQRTHQNKKG (SEQ. ID. No: 15) (lower case letters are a 
leader sequence from Shi & Berg, Chemistry and Biology 1, 83-89. (1995). The optimal 
binding sequence for the Sp-1 consensus sequence is 5'GGGGCGGGG3' (SEQ ID No: 
5 16) . Other suitable ZFPs are described below. 

There are a number of substitution rules that assist rational design of some zinc 
finger proteins (see Desjarlais & Berg, PNAS 90, 2256-2260 (1993); Choo & Klug, PNAS 
91, 1 1 163-1 1 167 (1994); Desjarlais & Berg, PNAS 89, 7345-7349 (1992); Jamieson et 
al, supra; Choo et al., WO 98/53057, WO 98/53058; WO 98/53059; WO 98/53060). 

10 Many of these rules are supported by site-directed mutagenesis of the three-finger domain 
of the ubiquitous transcription factor, Sp-1 (Desjarlais and Berg, 1992; 1993). One of 
these rules is that a 5 ' G in a DNA triplet can be bound by a zinc finger incorporating 
arginine at position 6 of the recognition helix. Another substitution rule is that a G in the 
middle of a subsite can be recognized by including a histidine residue at position 3 of a 

15 zinc finger. A further substitution rule is that asparagine can be incorporated to recognize 
A in the middle of triplet, aspartic acid, glutamic acid, serine or threonine can be 
incorporated to recognize C in the middle of triplet, and amino acids with small side 
chains such as alanine can be incorporated to recognize T in the middle of triplet A 
further substitution rule is that the 3' base of triplet subsite can be recognized by 

20 incorporating the following amino acids at position -1 of the recognition helix: arginine 
to recognize G, glutamine to recognize A, glutamic acid (or aspartic acid) to recognize C, 
and threonine to recognize T. Although these substitution rules are useful in designing 
zinc finger proteins they do not take into account all possible target sites. Furthermore, 
the assumption underlying the rules, namely that a particular amino acid in a zinc finger 

25 is responsible for binding to a particular base in a subsite is only approximate. Context- 
dependent interactions between proximate amino acids in a finger or binding of multiple 
amino acids to a single base or vice versa can cause variation of the binding specificities 
predicted by the existing substitution rules. 

The technique of phage display provides a largely empirical means of generating 

30 zinc finger proteins with a desired target specificity (see e.g., Rebar, US 5,789,538; Choo 
et al., WO 96/06166; Barbas et al, WO 95/19431 and WO 98/543111; Jamieson et al., 
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supra). The method can be used in conjunction with, or as an alternative to rational 
design. The method involves the generation of diverse libraries of mutagenized zinc 
finger proteins, followed by the isolation of proteins with desired DNA-binding 
properties using affinity selection methods. To use this method, the experimenter 
typically proceeds as follows. First, a gene for a zinc finger protein is mutagenized to 
introduce diversity into regions important for binding specificity and/or affinity. In a 
typical application, this is accomplished via randomization of a single finger at positions 
-1, +2, +3, and +6, and sometimes accessory positions such as +1, +5, +8 and +10. Next, 
the mutagenized gene is cloned into a phage or phagemid vector as a fusion with gene III 
of a filamentous phage, which encodes the coat protein pill. The zinc finger gene is 
inserted between segments of gene III encoding the membrane export signal peptide and 
the remainder of pill, so that the zinc finger protein is expressed as an amino-terminal 
fusion with pill or in the mature, processed protein. When using phagemid vectors, the 
mutagenized zinc finger gene may also be fused to a truncated version of gene III 
encoding, minimally, the C-terminal region required for assembly of pill into the phage 
particle. The resultant vector library is transformed into E. coli and used to produce 
filamentous phage which express variant zinc finger proteins on their surface as fusions 
with the coat protein pill. If a phagemid vector is used, then the this step requires 
superinfection with helper phage. The phage library is then incubated with target DNA 
site, and affinity selection methods are used to isolate phage which bind target with high 
affinity from bulk phage. Typically, the DNA target is immobilized on a solid support, 
which is then washed under conditions sufficient to remove all but the tightest binding 
phage. After washing, any phage remaining on the support are recovered via elution 
under conditions which disrupt zinc finger - DNA binding. Recovered phage are used to 
infect fresh E. coli., which is then amplified and used to produce a new batch of phage 
particles. Selection and amplification are then repeated as many times as is necessary to 
enrich the phage pool for tight binders such that these may be identified using sequencing 
and/or screening methods. Although the method is illustrated for pill fusions, analogous 
principles can be used to screen ZFP variants as pVIII fusions. 

In certain embodiments, the sequence bound by a particular zinc finger protein is 
determined by conducting binding reactions (see, e.g., conditions for determination of K<j, 
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infra) between the protein and a pool of randomized double-stranded oligonucleotide 
sequences. The binding reaction is analyzed by an electrophoretic mobility shift assay 
(EMSA), in which protein-DNA complexes undergo retarded migration in a gel and can 
be separated from unbound nucleic acid. Oligonucleotides which have bound the finger 
are purified from the gel and amplified, for example, by a polymerase chain reaction. 
The selection {i.e. binding reaction and EMSA analysis) is then repeated as many times 
as desired, with the selected oligonucleotide sequences. In this way, the binding 
specificity of a zinc finger protein having a particular amino acid sequence is determined. 

Zinc finger proteins are often expressed with a heterologous domain as fusion 
proteins. Common domains for addition to the ZFP include, e.g., transcription factor 
domains (activators, repressors, co-activators, co-repressors), silencers, oncogenes (e.g., 
myc, jun, fos, myb, max, mad, rel, ets, bcl, myb, mos family members etc.); DNA repair 
enzymes and their associated factors and modifiers; DNA rearrangement enzymes and 
their associated factors and modifiers; chromatin associated proteins and their modifiers 
(e.g. kinases, acetylases and deacetylases); and DNA modifying enzymes (e.g., 
methyltransferases, topoisomerases, helicases, ligases, kinases, phosphatases, 
polymerases, endonucleases) and their associated factors and modifiers. A preferred 
domain for fusing with a ZFP when the ZFP is to be used for represssing expression of a 
target gene is a KRAB repression domain from the human KOX-1 protein (Thiesen et al., 
New Biologist 2, 363-374 (1990); Margolin et al, Proc. Natl. Acad. Sci. USA 91, 4509- 
4513 (1994); Pengue et al, Nucl Acids Res. 22:2908-2914 (1994); Witzgall et al., Proc. 
Natl Acad. Sci. USA 91, 4514-4518 (1994). Preferred domains for achieving activation 
include the HSV VP16 activation domain (see, e.g., Hagmann et al, J. Virol. 71, 5952- 
5962 (1997)) nuclear hormone receptors (see, e.g., Torchia et al, Curr. Opin. Cell BioL 
10:373-383 (1998)); the p65 subunit of nuclear factor kappa B (Bitko & Barik, J. Virol 
72:5610-5618 (1998)and Doyle & Hunt, Neuroreport 8:2937-2942 (1997)); Liu et al., 
Cancer Gene Ther. 5:3-28 (1998)), or artificial chimeric functional domains such as 
VP64 (Seifpal et al, EMBO J. 11, 4961-4968 (1992)). 

An important factor in the administration of polypeptide compounds, such as the 
ZFPs, is ensuring that the polypeptide has the ability to traverse the plasma membrane of 
a cell, or the membrane of an intra-cellular compartment such as the nucleus. Cellular 
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membranes are composed of lipid-protein bilayers that are freely permeable to small, 
nonionic lipophilic compounds and are inherently impermeable to polar compounds, 
macromolecules, and therapeutic or diagnostic agents. However, proteins and other 
compounds such as liposomes have been described, which have the ability to translocate 
polypeptides such as ZFPs across a cell membrane. 

For example, "membrane translocation polypeptides" have amphiphilic or 
hydrophobic amino acid subsequences that have the ability to act as membrane- 
translocating carriers. In one embodiment, homeodomain proteins have the ability to 
translocate across cell membranes. The shortest internalizable peptide of a homeodomain 
protein, Antennapedia, was found to be the third helix of the protein, from amino acid 
position 43 to 58 (see, e.g., Prochiantz, Current Opinion in Neurobiology 6:629-634 
(1996)). Another subsequence, the h (hydrophobic) domain of signal peptides, was found 
to have similar cell membrane translocation characteristics (see, e.g., Lin et al, J. Biol 
Chem. 270:1 4255-14258 (1995)). 

Examples of peptide sequences which can be linked to a ZFP, for facilitating 
uptake of ZFP into cells, include, but are not limited to: an 1 1 amino acid peptide of the 
tat protein of HIV; a 20 residue peptide sequence which corresponds to amino acids 84- 
103 of the pl6 protein (see Fahraeus et al, Current Biology 6:84 (1996)); the third helix 
of the 60-amino acid long homeodomain of Antennapedia (Derossi et al, J. Biol. Chem, 
269:10444 (1994)); the h region of a signal peptide such as the Kaposi fibroblast growth 
factor (K-FGF) h region (Lin et al, supra); or the VP22 translocation domain from HSV 
(Elliot & O'Hare, Cell 88:223-233 (1997)). Other suitable chemical moieties that 
provide enhanced cellular uptake may also be chemically linked to ZFPs. 

Toxin molecules also have the ability to transport polypeptides across cell 
membranes. Often, such molecules are composed of at least two parts (called "binary 
toxins"): a translocation or binding domain or polypeptide and a separate toxin domain 
or polypeptide. Typically, the translocation domain or polypeptide binds to a cellular 
receptor, and then the toxin is transported into the cell. Several bacterial toxins, 
including Clostridium perfringens iota toxin, diphtheria toxin (DT), Pseudomonas 
exotoxin A (PE), pertussis toxin (PT), Bacillus anthracis toxin, and pertussis adenylate 
cyclase (CYA), have been used in attempts to deliver peptides to the cell cytosol as 
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internal or amino-terminal fusions (Arora et al, J. Biol Chem., 268:3334-3341 (1993); 
Perelle etal, Infect Immun., 61:5147-5156 (1993); Stenmark etal, J. Cell Biol 
113:1025-1032 (1991); Donnelly etal., PNAS 90:3530-3534 (1993); Carbonetti etal, 
Abstr. Annu. Meet Am. Soc. Microbiol 95:295 (1995); Sebo etal, Infect. Immun. 
5 63:3851-3857 (1995); Klimpel etal, PNAS U.S.A. 89:10277-10281 (1992); and Novak et 
al, J. Biol Chem. 267:17186-17193 1992)). 

Such subsequences can be used to translocate ZFPs across a cell membrane. 
ZFPs can be conveniently fused to or derivatized with such sequences. Typically, the 
translocation sequence is provided as part of a fusion protein. Optionally, a linker can be 
10 used to link the ZFP and the translocation sequence. Any suitable linker can be used, 
e.g., a peptide linker. 

III. Position Dependence Of Subsite Recognition By Zinc Fingers 

A number of the polypeptides disclosed herein have been characterized using the 
15 methods disclosed in parent application Serial No. 09/716,637 (the disclosure of which is 
hereby incorporated by reference in its entirety); in particular with respect to the effect of 
M. their position, within a multi-finger protein, on their sequence specificity. The results of 

fT* these investigations provide a set of zinc finger sequences that are optimized for 

O recognition of certain triplet target subsites whose 5 '-most nucleotide is a G (i.e., GNN 

M= 20 triplet subsites). Thus, particular zinc finger sequences which recognize each of the GNN 
triplet subsites, from each position of a three- finger zinc finger protein, are provided. See 
Figure 2. It will be clear to those of skill in the art that the optimized, position-specific 
zinc finger sequences disclosed herein for recognition of GNN target subsites are not 
limited to use in three- finger proteins. For example, they are also useful in six-finger 
25 proteins, which can be made by linkage of two three- finger proteins. 

A number of zinc finger amino acid sequences which are reported to bind to target 
subsites in which the 5 '-most nucleotide residue is G (i.e., GNN subsites) have recently 
been disclosed. Segal et al (1999) Proc. Natl Acad. Sci. USA 96:2758-2763; Drier et 
al. (2000) J. Mol Biol 303:489-502; U.S. Patent No. 6,140,081. These GNN-binding 
30 zinc fingers were obtained by selection of finger 2 sequences from phage display libraries 
of three-finger proteins, in which certain amino acid residues of finger 2 had been 
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randomized. Due to the manner in which they were selected, it is not clear whether these 
sequences would have the same target subsite specificity if they were present in the Fl 
and/or F3 positions. 

Use of the methods and compositions disclosed herein has now allowed 
identification of specific zinc finger sequences that bind each of the 16 GNN triplet 
subsites, and for the first time, provides zinc finger sequences that are optimized for 
recognition of these triplet subsites in a position-dependent fashion. Moreover, in vivo 
studies of these optimized designs reveal that the functionality of a ZFP is correlated with 
its binding affinity to its target sequence. See Example 6, infra. 

As a result of the discovery, disclosed herein, that sequence recognition by zinc 
fingers is position-dependent, it is clear that existing design rules will not, in and of 
themselves, be applicable to every situation in which it is necessary to construct a 
sequence-specific ZFP. The results disclosed herein show that many zinc fingers that are 
constructed based on design rules exhibit the sequence specificity predicted by those 
design rules only at certain finger positions. The position-specific zinc fingers disclosed 
herein are likely to function more efficiently in vivo and in cultured cells, with fewer 
nonspecific effects. Highly specific ZFPs, made using position-specific zinc fingers, will 
be useful tools in studying gene function and will find broad applications in areas as 
diverse as human therapeutics and plant engineering. 

IV. Production of Zinc Finger Proteins 

ZFP polypeptides and nucleic acids encoding the same can be made using routine 
techniques in the field of recombinant genetics. Basic texts disclosing the general 
methods include Sambrook et al., Molecular Cloning, A Laboratory Manual (2nd ed. 
1989); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); and 
Current Protocols in Molecular Biology (Ausubel et al., eds., 1994)). In addition, 
nucleic acids less than about 100 bases can be custom ordered from any of a variety of 
commercial sources, such as The Midland Certified Reagent Company 
(mcrc@oligos.com), The Great American Gene Company (http://www.genco.com), 
ExpressGen Inc. (www.expressgen.com), Operon Technologies Inc. (Alameda, CA). 
Similarly, peptides can be custom ordered from any of a variety of sources, such as 
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PeptidoGenic (pkim@ccnet.com), HTI Bio-products, inc. (http://www.htibio.com), BMA 
Biomedicals Ltd (U.K.), Bio.Synthesis, Inc. 

Oligonucleotides can be chemically synthesized according to the solid phase 
phosphoramidite triester method first described by Beaucage & Caruthers, Tetrahedron 
5 Letts. 22:1859-1862 (1981), using an automated synthesizer, as described in Van 
Devanter et al., Nucleic Acids Res. 12:6159-6168 (1984). Purification of 
oligonucleotides is by either denaturing polyacrylamide gel electrophoresis or by reverse 
phase HPLC. The sequence of the cloned genes and synthetic oligonucleotides can be 
verified after cloning using, e.g., the chain termination method for sequencing double- 
10 stranded templates of Wallace et al., Gene 16:21-26 (1981). 

Two alternative methods are typically used to create the coding sequences 
required to express newly designed DNA-binding peptides. One protocol is a PCR-based 
assembly procedure that utilizes six overlapping oligonucleotides (Fig. 1). Three 
oligonucleotides (oligos 1,3, and 5 in Figure 1) correspond to "universal" sequences that 
15 encode portions of the DNA-binding domain between the recognition helices. These 
oligonucleotides typically remain constant for all zinc finger constructs. The other three 
"specific" oligonucleotides (oligos 2, 4, and 6 in Fig. 1) are designed to encode the 
recognition helices. These oligonucleotides contain substitutions primarily at positions - 
1, 2, 3 and 6 on the recognition helices making them specific for each of the different 
20 DNA-binding domains. 

The PCR synthesis is carried out in two steps. First, a double stranded DNA 
template is created by combining the six oligonucleotides (three universal, three specific) 
in a four cycle PCR reaction with a low temperature annealing step, thereby annealing the 
oligonucleotides to form a DNA "scaffold." The gaps in the scaffold are filled in by 
25 high-fidelity thermostable polymerase, the combination of Taq and Pfu polymerases also 
suffices. In the second phase of construction, the zinc finger template is amplified by 
external primers designed to incorporate restriction sites at either end for cloning into a 
shuttle vector or directly into an expression vector. 

An alternative method of cloning the newly designed DNA-binding proteins relies 
30 on annealing complementary oligonucleotides encoding the specific regions of the 
desired ZFP. This particular application requires that the oligonucleotides be 
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phosphorylated prior to the final ligation step. This is usually performed before setting 
up the annealing reactions. In brief, the "universal" oligonucleotides encoding the 
constant regions of the proteins (oligos 1 , 2 and 3 of above) are annealed with their 
complementary oligonucleotides. Additionally, the "specific" oligonucleotides encoding 
the finger recognition helices are annealed with their respective complementary 
oligonucleotides. These complementary oligos are designed to fill in the region which 
was previously filled in by polymerase in the above-mentioned protocol. The 
complementary oligos to the common oligos 1 and finger 3 are engineered to leave 
overhanging sequences specific for the restriction sites used in cloning into the vector of 
choice in the following step. The second assembly protocol differs from the initial 
protocol in the following aspects: the "scaffold" encoding the newly designed ZFP is 
composed entirely of synthetic DNA thereby eliminating the polymerase fill-in step, 
additionally the fragment to be cloned into the vector does not require amplification. 
Lastly, the design of leaving sequence-specific overhangs eliminates the need for 
restriction enzyme digests of the inserting fragment. Alternatively, changes to ZFP 
recognition helices can be created using conventional site-directed mutagenesis methods. 

Both assembly methods require that the resulting fragment encoding the newly 
designed ZFP be ligated into a vector. Ultimately, the ZFP-encoding sequence is cloned 
into an expression vector. Expression vectors that are commonly utilized include, but are 
not limited to, a modified pMAL-c2 bacterial expression vector (New England BioLabs 
or an eukaryotic expression vector, pcDNA (Promega). The final constructs are verified 
by sequence analysis. 

Any suitable method of protein purification known to those of skill in the art can 
be used to purify ZFPs (see, Ausubel, supra, Sambrook, supra). In addition, any suitable 
host can be used for expression, e.g., bacterial cells, insect cells, yeast cells, mammalian 
cells, and the like. 

Expression of a zinc finger protein fused to a maltose binding protein (MBP-ZFP) 
in bacterial strain JM109 allows for straightforward purification through an amylose 
column (NEB). High expression levels of the zinc finger chimeric protein can be 
obtained by induction with IPTG since the MBP-ZFP fusion in the pMal-c2 expression 
plasmid is under the control of the tac promoter (NEB). Bacteria containing the MBP- 
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ZFP fusion plasmids are inoculated into 2xYT medium containing 10|nM ZnC12, 0.02% 
glucose, plus 50 \ig/m\ ampicillin and shaken at 37°C. At mid-exponential growth IPTG 
is added to 0.3 mM and the cultures are allowed to shake. After 3 hours the bacteria are 
harvested by centrifiigation, disrupted by sonication or by passage through a french 
5 pressure cell or through the use of lysozyme, and insoluble material is removed by 

centrifiigation. The MBP-ZFP proteins are captured on an amylose-bound resin, washed 
extensively with buffer containing 20 mM Tris-HCl (pH 7.5), 200 mM NaCl, 5 mM DTT 
and 50 nM ZnC12 , then eluted with maltose in essentially the same buffer (purification is 
based on a standard protocol from NEB). Purified proteins are quantitated and stored for 
1 0 biochemical analysis . 
^ The dissociation constants of the purified proteins, e.g., Kd, are typically 

yo characterized via electrophoretic mobility shift assays (EMSA) (Buratowski & Chodosh, 
, f1 in Current Protocols in Molecular Biology pp. 12.2.1-12.2.7 (Ausubel ed., 1996)). 

p Affinity is measured by titrating purified protein against a fixed amount of labeled 

CO 15 double-stranded oligonucleotide target. The target typically comprises the natural 
* * binding site sequence flanked by the 3 bp found in the natural sequence and additional, 

constant flanking sequences. The natural binding site is typically 9 bp for a three-finger 
fy protein and 2 x 9 bp + intervening bases for a six finger ZFP. The annealed 

™ oligonucleotide targets possess a 1 base 5' overhang which allows for efficient labeling 

rss? 

M 5 20 of the target with T4 phage polynucleotide kinase. For the assay the target is added at a 
concentration of 1 nM or lower (the actual concentration is kept at least 10-fold lower 
than the expected dissociation constant), purified ZFPs are added at various 
concentrations, and the reaction is allowed to equilibrate for at least 45 min. In addition 
the reaction mixture also contains 10 mM Tris (pH 7.5), 100 mM KC1, 1 mM MgC12, 0.1 

25 mM ZnC12, 5 mM DTT, 10% glycerol, 0.02% BSA. (NB: in earlier assays poly d(IC) 
was also added at 10-100 ng/|il.) 

The equilibrated reactions are loaded onto a 10% polyacrylamide gel, which has 
been pre-run for 45 min in Tris/glycine buffer, then bound and unbound labeled target is 
resolved by electrophoresis at 150V. (alternatively, 10-20% gradient Tris-HCl gels, 

30 containing a 4% polyacrylamide stacker, can be used) The dried gels are visualized by 
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autoradiography or phosphorimaging and the apparent Kd is determined by calculating 
the protein concentration that gives half-maximal binding. 

The assays can also include determining active fractions in the protein 
preparations. Active fractions are determined by stoichiometric gel shifts where proteins 
are titrated against a high concentration of target DNA. Titrations are done at 100, 50, 
and 25% of target (usually at micromolar levels). 

V. Applications of Engineered Zinc Finger Proteins 

ZPFs that bind to a particular target gene, and the nucleic acids encoding them, 
can be used for a variety of applications. These applications include therapeutic methods 
in which a ZFP or a nucleic acid encoding it is administered to a subject and used to 
modulate the expression of a target gene within the subject. See, for example, co-owned 
WO 00/41566. The modulation can be in the form of repression, for example, when the 
target gene resides in a pathological infecting microrganisms, or in an endogenous gene 
of the patient, such as an oncogene or viral receptor, that is contributing to a disease state. 
Alternatively, the modulation can be in the form of activation when activation of 
expression or increased expression of an endogenous cellular gene can ameliorate a 
diseased state. For such applications, ZFPs, or more typically, nucleic acids encoding 
them are formulated with a pharmaceutically acceptable carrier as a pharmaceutical 
composition. 

Pharmaceutically acceptable carriers are determined in part by the particular 
composition being administered, as well as by the particular method used to administer 
the composition, {see, e.g., Remington 's Pharmaceutical Sciences, 17 th ed. 1985)). The 
ZFPs, alone or in combination with other suitable components, can be made into aerosol 
formulations (i.e., they can be "nebulized") to be administered via inhalation. Aerosol 
formulations can be placed into pressurized acceptable propellants, such as 
dichlorodifluoromethane, propane, nitrogen, and the like. Formulations suitable for 
parenteral administration, such as, for example, by intravenous, intramuscular, 
intradermal, and subcutaneous routes, include aqueous and non-aqueous, isotonic sterile 
injection solutions, which can contain antioxidants, buffers, bacteriostats, and solutes that 
render the formulation isotonic with the blood of the intended recipient, and aqueous and 
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non-aqueous sterile suspensions that can include suspending agents, solubilizers, 
thickening agents, stabilizers, and preservatives. Compositions can be administered, for 
example, by intravenous infusion, orally, topically, intraperitoneally, intravesically or 
intrathecally. The formulations of compounds can be presented in unit-dose or multi- 
dose sealed containers, such as ampules and vials. Injection solutions and suspensions 
can be prepared from sterile powders, granules, and tablets of the kind previously 
described. 

The dose administered to a patient should be sufficient to effect a beneficial 
therapeutic response in the patient over time. The dose is determined by the efficacy and 

of the particular ZFP employed, the target cell, and the condition of the patient, as 
well as the body weight or surface area of the patient to be treated. The size of the dose 
also is determined by the existence, nature, and extent of any adverse side-effects that 
accompany the administration of a particular compound or vector in a particular patient 

In other applications, ZFPs are used in diagnostic methods for sequence specific 
detection of target nucleic acid in a sample. For example, ZFPs can be used to detect 
variant alleles associated with a disease or phenotype in patient samples. As an example, 
ZFPs can be used to detect the presence of particular mRNA species or cDNA in a 
complex mixtures of mRNAs or cDNAs. As a further example, ZFPs can be used to 
quantify copy number of a gene in a sample. For example, detection of loss of one copy 
of a p53 gene in a clinical sample is an indicator of susceptibility to cancer. In a further 
example, ZFPs are used to detect the presence of pathological microorganisms in clinical 
samples. This is achieved by using one or more ZFPs specific to genes within the 
microorganism to be detected. A suitable format for performing diagnostic assays 
employs ZFPs linked to a domain that allows immobilization of the ZFP on an ELIS A 
plate. The immobilized ZFP is contacted with a sample suspected of containing a target 
nucleic acid under conditions in which binding can occur. Typically, nucleic acids in the 
sample are labeled (e.g., in the course of PCR amplification). Alternatively, unlabelled 
probes can be detected using a second labelled probe. After washing, bound-labelled 
nucleic acids are detected. 

ZFPs also can be used for assays to determine the phenotype and function of gene 
expression. Current methodologies for determination of gene function rely primarily 
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upon either overexpression or removing (knocking out completely) the gene of interest 
from its natural biological setting and observing the effects. The phenotypic effects 



One advantage of ZFP-mediated regulation of a gene relative to conventional 
knockout analysis is that expression of the ZFP can be placed under small molecule 
control. By controlling expression levels of the ZFPs, one can in turn control the 
expression levels of a gene regulated by the ZFP to determine what degree of repression 
or stimulation of expression is required to achieve a given phenotypic or biochemical 
effect. This approach has particular value for drug development. By putting the ZFP 
under small molecule control, problems of embryonic lethality and developmental 
compensation can be avoided by switching on the ZFP repressor at a later stage in mouse 
development and observing the effects in the adult animal. Transgenic mice having 
target genes regulated by a ZFP can be produced by integration of the nucleic acid 
encoding the ZFP at any site in trans to the target gene. Accordingly, homologous 
recombination is not required for integration of the nucleic acid. Further, because the 
ZFP is trans-dominant, only one chromosomal copy is needed and therefore functional 
knock-out animals can be produced without backcrossing. 

All references cited above are hereby incorporated by reference in their entirety 
for all purposes. 



Example 1: Initial design of zinc finger proteins and determination of 
binding affinity 

Initial ZFP designs were based on existing design rules, correspondence regimes 
and ZFP directories, including those disclosed herein (see Tables 1-5) and also in 
WO 98/53058; WO 98/530059; WO 98/53060 and co-owned US patent application 
Serial No. 09/444,241. See also WO 00/42219. Amino acid sequences were 
conceptually designed using amino acids 532-624 of the human transcription factor Spl 
as a backbone. Polynucleotides encoding designed ZFPs were assembled using a 
Polymerase Chain Reaction (PCR)-based procedure that utilizes six overlapping 
oligonucleotides. PCR products were directly cloned cloning into the Tac promoter 



observed indicate the role of the gene in the biological system. 
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vector, pMal-c2 (New England Biolabs, Beverly, MA) using the Kpnl and BamHI 
restriction sites. The encoded maltose binding protein-ZFP fusion polypeptides were 
purified according to the manufacturer's procedures (New England Biolabs, Beverly, 
MA). Binding affinity was measured by gel mobility-shift analysis. All of these 
5 procedures are described in detail in co-owned WO 00/41566 and WO 00/42219, as well 
as in Zhang et al (2000) J. Biol Chem. 275:33,850-33,860 and Liu et al (2001) J. Biol 
Chem. 276:1 1,323-1 1,334; the disclosures of which are hereby incorporated by reference 
in their entireties. 

10 Example 2: Optimization of binding specificity by site selection 

Designed ZFPs were tested for binding specificity using site selection methods 
disclosed in parent application USSN 09/716,637. Briefly, designed proteins were 
incubated with a population of labeled, double-stranded oligonucleotides comprising a 
library of all possible 9- or 10-nucleotide target sequences. Five nanomoles of labeled 

1 5 oligonucleotides were incubated with protein, at a protein concentration 4-fold above its 
K<i for its target sequence. The mixture was subjected to gel electrophoresis, and bound 
oligonucleotides were identified by mobility shift, and extracted from the gel. The 
purified bound oligonucleotides were amplified, and the amplification products were 
used for a subsequent round of selection. At each round of selection, the protein 

20 concentration was decreased by 2 fold. After 3-5 rounds of selection, amplification 

products were cloned into the TOPO TA cloning vector (Invitrogen, Carlsbad, CA), and 
the nucleotide sequences of approximately 20 clones were determined. The identities of 
the target sites bound by a designed protein were determined from the sequences and 
expressed as a compilation of subsite binding sequences. 

25 

Example 3: Comparison of site selection results with binding affinity 

To test the correlation between site selection results and the affinity of binding of 
a ZFP to various related targets, site selection experiments were conducted on 2 three- 
finger ZFPs, denoted ZFP1 and ZFP2, and the site selection results were compared with 
30 IQ measurements obtained from quantitative gel-mobility shift assays using the same 
ZFPs and target sites. Each ZFP was constructed, based on design rules, to bind to a 
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particular nine-nucleotide target sequence (comprising 3 three-nucleotide subsites), as 
shown in Figure 1. Site selection results and affinity measurements are also shown in 
Figure 1. The site selection results showed that fingers 1 and 3 of both the ZFP1 and 
ZFP2 proteins preferentially selected their intended target sequences. However, the 
second finger of each ZFP preferentially selected subsites other than those to which they 
were designed to bind (e.g., F2 of ZFP 1 was designed to bind TCG, but preferentially 
selected GTG; F2 of ZFP2 was designed to bind GGT, but preferentially selected GGA). 

To confirm the site selection results, binding affinities of ZFP 1 and ZFP2 were 
measured (see Example 1, supra), both to their original target sequences and to new 
target sequences reflecting the site selection results. For example, the Mt-1 sequence 
contains two base changes (compared to the original target sequence for ZFP1) which 
result in a change in the sequence of the finger 2 subsite to GTG, reflecting the preferred 
finger 2 subsite sequence obtained by site selection. In agreement with the site selection 
results, binding of ZFP 1 to the Mt-1 sequence is approximately 4-fold stronger than its 
binding to the original target sequence (Ka of 12.5 nM compared to a Ka of 50 nM, see 
Figure 1). 

For ZFP2, the specificity of finger 2 for the 3' base of its target subsite was tested, 
since, although this finger was designed to bind GGT, site selection indicated that it 
bound preferentially to GGA. Moreover, the site selection results predicted that finger 2 
of ZFP2 would bind with approximately equal affinity to GGT and GGC. Accordingly, 
target sequences containing GGA (Mt-3) and GGC (Mt-4) at the finger 2 subsite were 
constructed, and binding affinities of ZFP2 to these target sequences, and to its original 
target sequence (containing GGT at the finger 2 subsite), were compared. In complete 
agreement with the site selection results, ZFP2 exhibited the strongest binding affinity for 
the target sequence containing GGA at the finger 2 subsite (K<j of 0.5 nM, Figure 1), and 
its affinity for target sequences containing either GGT or GGC at the finger 2 subsite was 
approximately equal (K^ of 1 nM for both targets, Figure 1). Accordingly, the site 
selection method, in addition to being useful for iterative optimization of binding 
specificity, can also be used as a useful indicator of binding affinity. 
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Example 4: Use of site selection to identify position-dependent, GNN-binding 
zinc fingers 

A large number of engineered ZFPs have been evaluated, by site selection, to 
identify zinc fingers that bind to GNN target subsites. In the course of these studies, it 

5 became apparent that the binding specificity of a particular zinc finger sequence is, in 
some instances, dependent upon the position of the zinc finger in the protein, and hence 
upon the location of the target subsite within the target sequence. For example, if one 
wishes to design a three-finger zinc finger protein to bind to a target sequence containing 
the triplet subsite GAT, it is necessary to know whether this subsite is the first, second or 

10 third subsite in the target sequence (Le., whether the GAT subsite will be bound by the 
first, second or third finger of the protein). Accordingly, over 110 three- finger zinc 
finger proteins, containing potential GNN-recognizing zinc fingers in various locations, 
have been evaluated by site selection experiments. Generally, several zinc finger 
sequences were designed to recognize each GNN triplet, and each design was tested in 

15 each of the Fl, F2 and F3 positions through 4 to 6 rounds of selection. 

The results of these analyses, shown in Figure 2, provide optimal position- 
dependent zinc finger sequences (the sequences shown represent amino acid residues -1 
through +6 of the recognition helix portion of the finger) for recognition of the 16 GNN 
target subsites, as well as site selection results for these GNN-specific zinc fingers. 

20 Optimal amino acid sequences for recognition of each GNN subsite from each of three 
positions (finger 1, finger 2 or finger 3) are thereby provided. 

GNG-binding finger designs 
I The amino acid sequence RSDXLXR^pOsition -1 to +6 of the recognition helix) 

was found to be optimal for binding to tljerlxmr GNG triplets, with Asn +3 specifying A as 

25 the middle nucleotide; His +3 specifying G as the middle nucleotide; Ala +3 specifying T as 
the middle nucleotide; and Asp 3 specifying cytosine as the middle nucleotide. At the +5 
position, Ala, Thr, Sep^nd Gin, wbre tested, and all showed similar specificity profiles 
by site selection/lnterestingly, and in contrast to a previous report (Swirnoff et al (1995) 
Mol Cell. Biol. 15:2275-2287), site selection results indicated that three naturally- 

30 occupatfig GCG-binding fingers from zif268 and Spl, having the amino acid sequences 
f^DELTR, RSDELQR, and RSDERKR, were not GCG-specific. Rather, each of these 
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fingers selecte^lmost equal numbers of GCG and GTG sequences. Analysis of binding 
affinity b^el-shift experiments confirmed that finger 3 of zif268, having the sequence 
RSDBfUCR, binds GCG and GTG with approximately equal affinity. 

Position dependence of GCA-, GAT-. GGT~ t GAA- and GCC-bindinz fingers 
Based on existing design rules, the^rfiino acid sequence QSGDLTR (-1 through 
+6) was tested for its ability to bind thyGCA triplet from three positions (Fl, F2, and F3) 
within a three- finger ZFP. Figure 3A shows that the QSGDLTR sequence bound 
preferentially to the GCA triplet subsite from the F2 and F3 positions, but not from Fl . 
In fact, the presence of QSGDLTR at the Fl position of three different three- finger ZFPs 
resulted predominantly ^selection of GCT. Accordingly, an attempt was made to 
redesign this sequent to obtain specificity for GCA from the Fl position. Since the 
sequence Q _1 G^o 3 R +6 had previously been selected from a randomized Fl library using 
GCA as target (Rebar et al (1994) Science 263:671-673), a D (asp) to S (ser) change was 
made aUne +3 residue of this finger. The resulting sequence, QSGSLTR, was tested for 
its bkming specificity by site selection and found to preferentially bind GCA, from the Fl 
sition, in three different ZFPs (see Figure 2). 

The QSGSLTR zinc finger, optimised for recognition of the GCA subsite from 
he Fl position, was tested for its selectivity when located at the F2 position. 
Accordingly, two ZFPs, one containing QSGSLTR at finger 2 and one containing 
QSGDLTR at finger 2 (both having identical Fl sequences and identical F3 sequences) 
were tested by site selection. The results indicated that, when used at the F2 position, 
QSGSLTR bound preferentially to GTA, rather than GCA. Thus, for optimal binding of 
a GCA triplet suhtfrte from the Fl position, the amino acid sequence QSGSLTR is 
required; whil^; for optimal binding of the same subsite sequence from F2 or F3, 
QSGDLTR: should be used. Accordingly, different zinc finger amino acid sequences may 
be needed to specify a particular triplet subsite sequence, depending upon the location of 
the subsite within the target sequence and, hence, upon the position of the finger in the 
otein. 

Positional effecl&^ere also observed for zinc fingers recognizing GAT and GGT 
30 ' subsites. The zinp-finger amino acid sequence QSSNLAR (-1 through +6) is expected to 
bind to GAJ^ased on design rules. However, this sequence selected GAT only from the 
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^jjt^ v position, and not from the F2 and F3 positicms^from which the sequence GAA was 

^ tC Y preferentially bound (Figure 3B). Similarly<1he amino acid sequence QSSHLTR which, 
based on design rules, should bind GGT, selected GGT at the Fl position, but not at the 
F2 and F3 positions, from which [preferentially bound GGA (Figure 3C). Conversely, 
5 the amino acid sequence TSGHLVR has previously been disclosed to recognize the 
triplet GGT, based on its^election from a randomized library of zif268 finger 2. U.S. 
Patent No. 6,140,08X However, TSGHLVR was not specific for the GGT subsite when 
located at the reposition (Figure 3C). These results indicate that the binding specificity 
of many fip^ers is position dependent, and particularly point out that the sequence 
10 specif&ity of a zinc finger selected from a F2 library may be positionally limited. 

The results shown in Figure 2 indicate that recognition of at least GAA and GCC 
j% triplets by zinc fingers is also position dependent. 

^ These positional dependences stand in contrast to earlier published work, which 

0 suggested that zinc fingers behaved as independent modules with respect to the sequence 
m 15 specificity of their binding to DNA. Desjarlais et al. (1993) Proc. Natl Acad. Sci. USA 
P 90:2256-2260. 

LI Example 5: Characterization of EP2C 

1 y 

D The engineered zinc finger protein EP2C binds to a target sequence, 

L,Ji 

y* 20 GCGGTGGCT with a dissociation constant (Kd) of 2 nM. Site selection results indicated 
that fingers 1 and 2 are highly specific for their target subsites, while finger 3 selects 
GCG (its intended target subsite) and GTG at approximately equal frequencies 
(Figure 4A). To confirm these observations, the binding affinities of EP2C to its cognate 
target sequence, and to variant target sequences, was measured by standard gel-shift 
25 analyses (see Example 1, supra). As standards for comparison, the binding affinities of 
Spl and zif268 to their respective targets were also measured under the same conditions, 
and were determined to be 40 nM for SP1 (target sequence GGGGCGGGG) and 2 nM 
for zif268 (target sequence GCGTGGGCG). Measurements of binding affinities 
confirmed that F3 of EP2C bound GTG and GCG equally well (K^s of 2 nM), but bound 
30 GAG with a two-fold lower affinity (Figure 4B). Finger 2 was very specific for the GTG 
triplet, binding 15-fold less tightly to a GGG triplet (compare 2C0 and 2C3 in Figure 4B). 
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Finger 1 was also very specific for the GCT triplet, it bound with 4-fold lower affinity to 
a GAT triplet (2C4) and with 2-fold lower affinity to a GCG triplet (2C5). This example 
shows, once again, the high degree of correlation between site selection results and 
binding affinities. 

5 

Example 6: Evaluation of engineered ZFPs by in vivo functional assays 

To determine whether a correlation exists between the binding affinity of a 
engineered ZFP to its target sequence and its functionality in vivo, cell-based reporter 
gene assays were used to analyze the functional properties of the engineered ZFP EP2C 
10 (see Example 5, supra). For these assays, a plasmid encoding the EP2C ZFP, fused to a 
VP 16 transcriptional activation domain, was used to construct a stable cell line (T-Rex- 
293™, Invitrogen, Carlsbad, CA) in which expression of EP2C-VP16 is inducible, as 



described in Zhang et al, supra. To generate reporter constructs, three tandem copies of 
O the EP2C target site, or its variants (see Figure 4B, column 2), were inserted between the 

sses 

ig 15 Mlu I and Bglll sites of the pGL3 luciferase-encoding vector (Promega, Madison, WI), 
~ s upstream of the SV40 promoter. Structures of all reporter constructs were confirmed by 

M DNA sequencing. 

fy Luciferase reporter assays were performed by co-transfection of luciferase 

reporter construct (200 ng) and pCMV- pgal (100 ng, used as an internal control) into the 

fcaJ 

20 EP2C cells seeded in 6-well plates. Expression of the EP2C- VP 16 transcriptional 

activator was induced with doxycycline (0.05 ug/ml) 24 h after transfection of reporter 
constructs. Cell lysates were harvested 40 hours post-transfection, luciferase and 
(3-galactosidase activities were measured by the Dual-Light Reporter Assay System 
(Tropix, Bedford, MA), and luciferase activities were normalized to the co-transfected p~ 

25 galactosidase activities. The results, shown on the right side of Figure 4B, showed that 
the normalized luciferase activity for each reporter construct was well correlated with the 
in vitro binding affinity of EP2C to the target sequence present in the construct. For 
example, the target sequences to which EP2C bound with greatest affinity (2C0 and 2C2, 
K<i of 2 nM for each) both stimulated the highest levels of luciferase activity, when used 

30 to drive luciferase expression in the reporter construct (Figure 4B). Target sequences to 
which EP2C bound with 2-fold lower affinity, 2C1 and 2C5 (IQ of 4 nM for each), 
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stimulated roughly half the luciferase activity of the 2C0 and 2C2 targets. The 2C3 and 
2C4 sequences, for which EP2C showed the lowest in vitro binding affinities, also 
yielded the lowest levels of in vivo activity when used to drive luciferase expression. 
Target 3B, a sequence to which EP2C does not bind, yielded background levels of 
luciferase activity, similar to those obtained with a luciferase-encoding vector lacking 
EP2C target sequences (pGL3). Thus there exist good correlations between binding 
affinity (as determined by K<j measurement), binding specificity (as determined by site 
selection) and in vivo functionality for engineered zinc finger proteins. 
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DRSHLAR 1205 


1000 
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501 GTGGGGGTT 591 NRATLAR 7 96 RSDHLSR1001 RSDALAR 1206 8 

502 GGGGTGGGA 592 QSAHLAR 797 RSDALAR 1002 RSDHLSR 12 07 60 

507 GAGGTAGAGG 593 RSDNLAR 7 98 QRSALiAR 1003 RSDNLAR 12 08 10 

508 GAGGTAGAGG 594 RSDNLAR 799 QS ATLAR 1004 RSDNLAR 12 09 10 

509 GTCGTGTGGC 595 RSDHLTT 800 RSDALAR 1005 DRSALAR 1210 100 

510 GTTGAGGAAG 596 QSGNLAR 8 01 RSDNLAR 1006 NRATLAR 1211 100 

511 GTTGAGGAAG 597 QSGNLAR 802 RSDNLAR 1007 QSSALAR 1212 100 

512 GAGGTGGAAG 598 QSGNLAR 803 RSDALAR 1008 RSDNLAR 1213 10 

513 GAGGTGGAAG 599 QSANLAR 804 RSDALAR 1009 RSDNLAR 1214 1.5 

514 TAGGTGGTGG 600 RSDALTR 8 05 RSDALAR 1010 RSDNLTT 1215 10 

515 TGGGAGGAGT 601 RSDNLTR 806 RSDNLTR 1011 RSDHLTT 1216 0.5 

516 GGAGGAGCT 602 TTSELRR 807 QSGHLQR 1012 QSGHLSR 1217 700 

517 GGAGCTGGGG 603 RTDHLRR 808 TSSELQR 1013 QSGHLSR 1218 50 

518 GGGGGAGGAG 604 QTGHLRR 8 09 QSGHLQR 1014 RSDHLSR 1219 30 

519 GGGGAGGAGA 605 RSDNLAR 810 RSDHLSR 1015 RSDHLSR 122 0 0.3 
£0 520 GGAGGAGAT 606 TTANLRR 811 QSGHLQR 1016 QSGHLSR 1221 300 

521 GCAGCAGGA 607 QTGHLRR 812 QSGELQR 1017 QSGELSR 1222 1000 

522 GATGAGGCA 608 QTGELRR 813 RSDNLQR 1018 TSANLSR 1223 200 
52 7 GGGGAGGATC 609 TTSNLRR 814 RSSNLQR 1019 RSDHLSR 1224 2 

528 GGGGAGGATC 610 TTSNLRR 815 RSSNLQR 102 0 RSDHLSR 1225 10 

529 GAGGCTTGGG 611 RTDHLRK 816 TSAELQR 1021 RSSNLSR 1226 1000 

531 GCGGAGGCTT 612 TTGELRR 817 RSSNLQR 1022 RSDELSR 1227 160 

532 GCGGAGGCTT 613 QSSDLQR 818 RSSNLQR 102 3 RSDELSR 1228 100 

533 GCGGAGGCTT 614 QSSDLQR 819 RSDNLAR 1024 RSADLSR 122 9 7 

534 GCGGAGGCTT 615 QSSDLQR 82 0 RSDNLAR 102 5 RSDDLRR 12 3 0 10 

535 GCAGCCGGG 616 RTDHLRR 821 ESSDLQR 1026 QSGELSR 1231 1000 
538 GCAGAGGCTT 617 QSSDLQR 822 RSDNLAR 1027 QSGSLTR 1232 70 
54 0 TGGGCAGGCC 618 DRSHLTR 823 QSGSLTR 1028 RSDHLTT 1233 55 
541 GGGGAGGAT 619 TTSNLRR 824 RSSNLQR 102 9 RSDHLSR 1234 3 

570 GGGGAAGGCT 62 0 DSGHLTR 825 QRSNLVR 103 0 RSDHLTR 123 5 2 0 

571 GTGTGTGTGT 621 RSDSLTR 826 QRSNLVR 1031 RSDSLLR 1236 1000 



o 



m 
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572 GCATACGTGG 

573 GCATACGTG 

574 TACGTGGGGT 

575 TACGTGGGCT 

576 GAGGGTGTTG 

577 GGAGCGGGGA 

579 GGGGTTGAGG 

580 GGTGTTGGAG 

581 TACGTGGGTT 

583 GTAGGGGTTG 

584 GAAGGCGGAG 

585 GAAGGCGGAG 

587 GGGGGTTACG 

588 GGGGGGGGGG 

589 GGAGTATGCT 
595 TGGTTGGTAT 

597 TGGTTGGTA 

598 TGGTTGGTA 

599 TGGTTGGTA 
60 0 GAGTCGGAA 
601 GAGTCGGAA 
6 02 GAGTCGGAA 
603 GAGTCGGAA 
606 GGGGAGGATC 



622 RSDSLLR 827 

623 RSDSLLR 828 

624 RSDHLTR 829 

625 DFSHLTR 830 

626 NSDTLAR 831 

627 RSDHLSR 832 

628 RSDNLTR 833 

629 QRAHLAR 834 

630 QSSHLTR 835 

631 NSSALTR 836 

632 QAGHLTR 837 

633 QAGHLTR 838 

634 DKGNLQT 839 

635 RSDHLSR 840 

636 DSGHLAS 841 

637 QRGSLAR 842 

638 QNSAMRK 843 

639 QRGSLAR 844 
64 0 QNSAMRK 845 

641 QSANLAR 846 

642 RSANLTR 847 

643 RSANLTR 848 

644 QSGNLAR 849 

645 TTSNLRR 850 
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TABLE 3 

SEQ SEQ SEQ SEQ Kd 

TARGET ID Fl ID F2 ID F3 ID (nM) 

897 GAGGAGGTGA 1261 RSDALAR 1347 RSDNLAR 1433 RSDNLVR 1519 0.07 

828 GCGGAGGACC 1262 EKANLTR 1348 RSDNLAR 1434 RSDERKR 152 0 0.1 
884 GAGGAGGTGA 1263 RSDSLTR 1349 RSDNLAR 1435 RSDNLVR 1521 0.15 
817 GAGGAGGTGA 1264 RSDSLTR 1350 RSDNLAR 1436 RSDNLAR 1522 0.31 
666 GCGGAGGCGC 1265 RSDDLTR 13 51 RSDNLTR 1437 RSDTLKK 1523 0.5 

829 GCGGAGGACC 1266 EKANLTR 1352 RSDNLAR 1438 RSDTLKK 1524 0.52 
670 GACGTGGAGG 1267 RSDNLAR 1353 RSDALAR 1439 DRSNLTR 1525 0.57 

£ 801 AAGGAGTCGC 1268 RSADLRT 1354 RSDNLAR 1440 RSDNLTQ 1526 0.85 

p 668 GTGGAGGCCA 1269 ERGTLAR 1355 RSDNLAR 1441 RSDALAR 1527 1.13 

^ 895 ATGGATTCAG 1270 QSHDLTK 13 56 TSGNLVR 1442 RSDALTQ 152 8 1.4 

m 799 GGGGGAGCTG 1271 QSSDLQR 1357 QRAHLER 1443 RSDHLSR 152 9 1.85 

798 GGGGGAGCTG 1272 QSSDLQR 1358 QSGHLQR 1444 RSDHLSR 1530 3 

H 842 GAGGTGGGCT 1273 DRSHLTR 1359 RSDALAR 1445 RSDNLAR 1531 5.4 

« 894 TCAGTGGTAT 1274 QRSALAR 13 60 RSDALSR 1446 QSHDLTK 1532 6.15 

[J 892 ATGGATTCAG 1275 QSHDLTK 1361 QQSNLVR 1447 RSDALTQ 1533 6.2 

888 TCAGTGGTAT 1276 QSSSLVR 1362 RSDALSR 1448 QSHDLTK 1534 14 

73 9 GCGGGCGGGC 1277 RSDHLTR 1363 ERGHLTR 1449 RSDDLRR 1535 16.5 

850 CAGGCTGTGG 12 78 RSDALTR 13 64 QSSDLTR 1450 RSDNLRE 153 6 17 

797 GCAGAGGCTG 1279 QSSDLQR 1365 RSDNLAR 1451 QSGDLTR 153 7 17.5 

891 TCAGTGGTAT 1280 QSSSLVR 1366 RSDALSR 1452 QSGSLRT 153 8 18.5 

887 TCAGTGGTAT 1281 QRSALAR 1367 RSDALSR 1453 QSGDLRT 153 9 23.75 

672 TCGGACGTGG 1282 RSDALAR 1368 DRSNLTR 1454 RSDELRT 1540 24 

836 GGGGAGGCCC 1283 ERGTLAR 1369 RSDNLAR 1455 RSDHLSR 1541 24.25 

674 GCGGCGTCGG 1284 RSDELRT 13 70 RADTLRR 1456 RSDTLKK 1542 27.5 

849 GGGGCCCTGG 1285 RSDALRE 13 71 DRSSLTR 1457 RSDHLTQ 1543 29.05 

825 GAATGGGCAG 1286 QSGSLTR 1372 RSDHLTT 1458 QSGNLTR 1544 37.3 
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673 GCGGGTGTCT 1287 DRSALAR 1373 QSSHLAR 1459 RSDTLKK 1545 48.33 

848 GGGGAGGCCC 1288 DRSSLTR 13 74 RSDNLAR 1460 RSDHLSR 1546 49.5 

662 AGAGCGGCAC 12 89 QTGSLTR 13 75 RSDELQR 1461 QSGHLNQ 154 7 50 
667 GAGTCGGACG 12 90 DRSNLTR 137 6 RSDELRT 14 62 RSDNLAR 154 8 50 

803 GCAGCGGCTC 1291 QSSDLQR 13 77 RSDELQR 1463 QSGSLTR 1549 57.5 
671 TCGGACGAGT 1292 RSDNLAR 1378 DRSNLTR 1464 RSDELRT 1550 64 
851 GAGATGGATC 12 93 QSSNLQR 1379 RRDVLMN 1465 RLHNLQR 1551 74 

804 GCAGCGGCTC 1294 QSSDLQR 1380 RSDDLNR 1466 QSGSLTR 1552 82.5 
669 GACGAGTCGG 1295 RSDELRT 1381 RSDNLAR 1467 DRSNLTR 1553 90 
682 GCTGCAGGAG 1296 RSDHLAR 13 82 QSGDLTR 1468 QSSDLSR 1554 90 
845 GAGATGGATC 1297 QSSNLQR 1383 RSDALRQ 1469 RLHNLQR 1555 112.5 

663 AGAGCGGCAC 1298 QTGSLTR 1384 RSDELQR 1470 KNWKLQA 1556 115 
738 GCGGGGTCCG 12 99 ERGTLTT 13 85 RSDHLSR 1471 RSDDLRR 1557 120 

664 AGAGCGGCAC 13 00 QTGSLTR 1386 RADTLRR 1472 ASSRLAT 1558 125 

833 GACTAGGACC 1301 EKANLTR 13 87 RSDNLTK 1473 DRSNLTR 1559 136 
gj 685 GCTGCAGGAG 1302 RSDHLAR 1388 QSGSLTR 1474 QSSDLSR 1560 150 

835 TAGGGAGCGT 1303 RADTLRR 1389 QSGHLTR 1475 RSDNLTT 1561 150 

847 TAGGGAGCGT 13 04 RSDDLTR 13 90 QSGHLTR 1476 RSDNLTT 1562 150 

818 GAATGGGCAG 1305 QSGSLTR 1391 R'SDHLTT 1477 QSSNLVR 1563 167 

834 GACTAGGACC 1306 EKANLTR 13 92 RSDHLTT 1478 DRSNLTR 1564 186 

837 GGGGCCCTGG 13 07 RSDALRE 13 93 DRSSLTR 1479 RSDHLSR 1565 222 

764 GCAGAGGCTG 1308 TSGELVR 13 94 RSDNLAR 1480 QSGDLTR 1566 255 

774 GCAGCGGTAG 13 09 QRSALAR 13 95 RSDELQR 1481 QSGDLTR 1567 258 

765 GCCGAGGCCG 1310 ERGTLAR 13 96 RSDNLAR 1482 ERGTLAR 1568 262.5 

766 GCCGAGGCCG 1311 ERGTLAR 1397 RSDNLAR 1483 DRSDLTR 1569 262.5 

775 GCAGCGGTAG 1312 QSGALTR 13 98 RSDELQR 1484 QSGDLTR 1570 265 
763 GCAGAGGCTG 1313 TSGELVR 1399 RSDNLAR 1485 QSGSLTR 1571 275 

838 GGGGCCCTGG 1314 RSDALRE 14 0 0 DRSSLTR 14 86 RSDHLTA 1572 300 
841 GAGTGTGAGG 1315 RSDNLAR 1401 QSSHLAS 1487 RSDNLAR 1573 300 
770 TTGGCAGCCT 1316 DRSSLTR 1402 QSGSLTR 1488 RSDSLTK 1574 325 

767 GGGGGAGCTG 1317 QSSDLAR 1403 QSGHLQR 1489 RSDHLSR 1575 335 
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SEQ 

^4 TARGET ID 

607 AAGGTGGCAG 1605 

608 TTGGCTGGGC 1606 

611 GTGGCTGCAG 1607 

612 GTGGCTGCAG 1608 

613 TTGGCTGGGC 1609 

614 TTGGCTGGGC 1610 

616 GAGGAGGATG 1611 

617 AAGGGGGGG 1612 

618 AAGGGGGGG 1613 

619 AAGGGGGGG 1614 
62 0 AAGGGGGGG 1615 
621 AAGGGGGGG 1616 
624 ACGGATGTCT 1617 
62 8 TTGTAGGGGA 1618 

62 9 TTGTAGGGGA 1619 

63 0 CGGGGAGAGT 1620 

646 TTGGTGGAAG 1621 

647 TTGGTGGAAG 1622 

651 GTTGTGGAAT 1623 

652 TAGGAGGCTG 1624 

653 TAGGAGGCTG 1625 

654 TAGGCATAAA 1626 

655 TAGGCATAAA 1627 

656 TAGGCATAAA 1628 

657 TAGGCATAAA 1629 
660 GAGGGAGTTC 1630 
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RSDNLTT 193 0 


1.5 


TTSDLTR 


1727 


RSDNLAR 


1829 


RSDNLTT 1931 


5.5 


QSGNLRT 


1728 


QSGSLTR 1830 


RSDNLTT 1932 


105 


QSGNLRT 


1729 


QSSTLRR 


1831 


RSDNLTT 1933 


1000 


QSGNLRT 


1730 


QSGSLTR 1832 


RSDNLTS 1934 


540 


QSGNLRT 


1731 


QSSTLRR 1833 


RSDNLTS 1935 


300 


NRATLAR 


1732 


QSGHLTR 1834 


RSDNLAR 1936 


8.25 



51 




661 GAGGGAGTTC 1631 
665 GCGGAGGCGC 1632 
68 9 AAGGCGGAGA 1633 

692 AAGGCGGAGA 1634 

693 AAGGCGGAGA 1635 

694 AAGGCGGAGA 1636 

695 GGGGGCGAGC 1637 

697 TGAGCGGCGG 1638 

698 TGAGCGGCGG 163 9 

699 GCGGCGGCAG 164 0 
7 00 GCGGCGGCAG 1641 

9 701 GCAGCGGAGC 1642 

J 702 GCAGCGGAGC 1643 

i 7 04 AAGGTGGCAG 1644 

H 1 705 GGGGTGGGGC 1645 

m 706 GGGGTGGGGC 1646 

s 708 GAGTCGGAA 1647 

H- 709 GAGTCGGAA 1648 

710 GAGTCGGAA 1649 
0 711 GAGTCGGAA 1650 

712 GGTGAGGAGT 1651 

713 GGTGAGGAGT 1652 

714 TGGGTCGCGG 1653 

715 TGGGTCGCGG 1654 

716 TTGGGAGCAC 1655 

717 TTGGGAGCAC 1656 

718 TTGGGAGCAC 1657 

719 GGCATGGTGG 1658 
72 0 GAAGAGGATG 1659 
722 ATGGGGGTGG 1660 
724 GGCATGGTGG 1661 




TTQ AT.TP 


1 7 "X *\ 
1 / o O 


OQP"HT TP 


1 ft *3 C 


P CnTT\7TP 
KoUJJ V 1 K 


1 / Ji 


P C "TYNTT TP 

KbJJJNIlj IK 


1 Q "3 £T 


PQTYMT TD 
KoJJlNlj 1 K 


1 7 "3 £ 


P CTiT?T op 


loo / 


P CHUT TP 
KoJJJNIj 1 K 


1 7 "3 £ 

1 / JO 


p cnT?T od 


1 Q *2 Q 

looo 


P QF»TvTT.TP 
xv O UlN x_i 1 xv 


1 / O / 


■DATlTT PP 
KH.JJ 1 LKK 


loo^ 


XV O UlN l_l 1 xv. 


1 / jO 


■DATiTT PP 


1 ft A O 
1 oft U 


PQC7\TT,TP 
xvu OiNIU ± XV 


1 7 7 Q 


u xv o n x-ir-ixv 


1 ft 4 1 


XV O X_/X_j l_l X XV 


1 74 n 


XVOXJiZj J_jOxv 


1 ft A 0 


P QTi'FT.TP 
xvOxJijlj 1 rv 


1 74 1 
X / *± 1 


pcnT^T CP 


1 QA1 
lo4o 


OCOQT TP 
yoboL 1 rv 


1 / 4 


p crvnT op 


1 Q A A 

1 o4 4 


O C OTlT TP 
tJbvjD J_i IK 


1 /4 J 


T> CFlTlT OTD 

KbDJJl_(JK 


lo4 b 


P C TYNTT A P 


T 7 /l /l 
1/44 


d chut r\T) 


1 O A 

lo4 o 


PQFlTsTT &P 
rv o U1N J_i/1K 


1 7 A R 


P CriTTT OP 

KbDr_l_yK 


1 Q A 7 

lo4 / 


>y/ O VJ1-/X-J J. XV 


1 746 


XV O J_> O __LC_XV 


1 ft 4 ft 


XVO X^XXi_Lrt.XV 


1 747 


P QDCIT.AP 


1 ft 4 Q 
1 o *± 


rv O U n J_Lrt.X\. 


1 74ft 


P QTlQT.T.P 
xvoiJo i_iJ_irv 


1 ft c: n 
loDU 




1 74 Q 


pnnTT A/n 

Xv^U 1 i_i V vj 


1 ft c: i 

lODl 




1 / DU 


KiSJJ Vl_ V o 


loDZ 


OQPT\TT AP 


1 7C1 


PT HPT DT 


TOCO 

looo 


OCO'NTT BD 


1 7 CO 


POPlTT T7P 


1 O CT A 

lob4 


P GTYMT AD 


1 7 C O 
1 / DO 


P O T"YMT A "D 

KbJJJN LiAK 


1 O r r 

lobb 


PQTYMT AP 
xv o UiN 1_lH.xv 


1 1 Z> 4 


P CTiMT A P 


loDD 


PCHPT PP 

XVO J^XLljiXxV 


i 7cc 

1 / jj 


HD C A T A P 


lob/ 


P A FiTT PP 
KJ\U 1 IjKK 


i 7cc 

1 / JO 


nn CAT AD 


T Q C O 

lo bo 


nCPCT TP 


1 7 cr 7 
1 / b / 


OCOTJT /~\T> 

ybCjrlLiyK 


TOCO 

18 by 


ncrcT tp 


1 7CQ 

1 / DO 


ybtarlLi^K 


T O £ A 

lo 60 


ncpcT TP 


1 7CQ 

1 / Dj 


ncpuT op 


lob 1 


RSDALTR 


1760 


h RSDALTS 


1862 


TTSNLAR 


1761 


RSDNLAR 


1863 


RSDALTR 


1762 


RSDHLTR 


1864 


RSDALTR 


1763 


RSDALRQ 


1865 
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P G rYNTT A P 1 Q *3 7 
KoUlNlj/iK iyo / 


1 T) 
1 . / O 


KbJJDljKK lyjo 


12 . b 


PT 'RlSTPTa 1 QTQ 


ft O R 


PCTlMT TO 1 Q/l r\ 

KbJJJNljiy iy4U 


bl 


xvljlJlNxv. l/\ li741 


Q R 

y o 


PQHTvTT.TO 1 Q4 7 


oft c; 

Zo . D 


P CPiLTT TP 1 Q A "3 


o b U 


O C OUT TI^ 1 Q/l /I 
^ovjiIIjIJX 1^44 


zUU 


OCUOT TC 1 QA cr 
^brlvjljlb iy4b 


oUU 


KbDilKKK ly4 6 


21.5 


KbDliKKK iy4 / 


4 5 


UbCabLilK 1948 


50 . 5 


ybUDLiiR iy4y 


73 . 5 


P QTTMT TO 1 QCn 


IT 

D 


PQPiT-JT CP 1 QC1 

KoJJrlJ_ioK lyol 


U . Ul 


PQ'HT-JT CP 1 QCO 
KoJJrlijoK lybZ 


U . Ub 


PCPlTvTT 3\D 1 QC*3 

kojjimijAk lyoo 


*3 ri n 
oUU 


P C TYNTT A P 1 Q C A 

KbJJlNJ_i/\K iyb4 


/inn 
4 U U 


"D C "PlT^TT A "D 1 Q C C 

KbDJNLiAK lybb 


400 


KbDIMLiAK 15 56 


400 


TV/f O r\TJT CD 1 QCH 

MbDHLibK 1957 


9 . 5 


lyibiiiiijbK iyb8 


0 . 15 


KbDHLTT 1959 


A A 

2 00 


KbDHLTT 1960 


0.46 


KGDALTS 1961 


200 


DCflAT TV 1 


15 0 


pcnAT TP 1 QCT"3 
KoJJAijlK lybo 


1 U / . b 


DRSHLAR 1964 


20 


QSGNLTR 1965 


1.6 


RSDALRQ 1966 


0.7 


DRSHLAR 1967 


2.5 



725 GCTTGAGTTA 1662 
72 6 GAAGAGGATG 1663 
72 7 GCGGTGGCTC 1664 
72 8 GGTGAGGAGT 1665 

72 9 GGAGGGGAGT 1666 

73 0 TGGGTCGCGG 1667 

731 GTGGGGGAGA 1668 

732 GCGGGTGGGG 1669 

733 GCGGGTGGGG 1670 

734 GGGGCTGGGT 1671 

735 GCGGTGGCTC 1672 
73 6 GAGGTGGGGA 1673 

73 7 GGAGGGGAGT 1674 

74 0 AAGGTGGCAG 1675 

741 AAGGCTGAGA 1676 

742 ACGGGGTTAT 1677 

743 ACGGGGTTAT 1678 

744 ACGGGGTTAT 1679 

745 ACGGGGTTAT 1680 
74 6 CTGGAAGCAT 1681 

747 CTATTTTGGG 1682 

748 TTGGACGGCG 1683 

749 TTGGACGGCG 1684 

750 GAGGGAGCGA 1685 

751 GGTGAGGAGT 1686 

752 GAGGTGGGGA 1687 

757 CGGGCGGCTG 1688 

758 CGGGCGGCTG 1689 

759 TTGGACGGCG 1690 

760 TTGGACGGCG 1691 

761 GCGGTGGCTC 1692 
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OQCALAP 


1 7£4 
X / O *± 


^OOXlXJ^XV 


lODD 


ybbJJLiyK Xybo 


n a n a 

3 0 0 0 


OSSNT.AR 


J_ / O 


pq-nKrT.AP 

XV O U1N XXrixv 


1 ft£7 
loo / 


PlCPTsTT TP 1 QCQ 

^ovjiMj-iixv iyby 


1 c 

X . b 


V^ ■ > kJ J_/ XJ X J.Y 


1 7fifi 

-L / \J \J 


pc.nAT.cp 

- IV O -L/.r_ J_l O IV 


1 ft £ ft 

X O DO 


P QFlTT Vlf 1 OTA 
KoJJXXjxvxv iy / u 


n i 
U . X 


PCnNTT.AP 


1 7£7 
X / o / 


PQrYMT AP 
xv o UiN lx_-_xv 


loo? 


FiCO VT CO T Q T 1 

UbbivLibK Xy / 1 


15 


PCJDNT.AP 

XV i~> LJL\ XXriXV 


1 76ft 
X / o o 


P CTWT.CP 
xvOxJxxXjiDxv 


i q 7 n 
It) / u 


r\ CPUT AD 1 QTO 

ybvjxiXiA.K Xy / Z 


t n a r» 
10 00 


pcrynT.TP 


1 7£Q 
X / O .7 


npcAT AP 


1 Q 7 1 
X O / X 


pcnuT r n r n i qto 
KbxJnXjXl iy/j 


1 A Art 

1000 


PCTYNTT.AP 

IV O UxN i 'ni\ 


1 77fl 
X / / u 


pcnuT cp 
KoUxIXjoxv 


1 Q 1 1 
X O / Z 


PCPlAT A "D 1 n ~ 7 /I 

KbDAJLjAK iy /4 


12 


XV O .Dxxxjjf-lxv 


17 71 
X / / X 


OC CUT A E> 


1 O *~7 *3 


KbDDLTR 1975 


22 . 5 


KbxJnlxtt.iv 


1 7 7 O 
X / I Z 


AO CUT A D 


lo /4 


RSDTLKK 1976 


0 . 32 


PCnUT.AP 


17 7 7 
X / / _5 


no cpiT cp 


lo / J 


KbUrlLibK 1977 


0.25 


yooJJijiiN. 


1 774 
x / / *± 


PCTIAT CP 
xv.oJJ.r_.__jb XV 


lo / D 


KbUxiKKK Xy /o 


0 . 05 


RSDHLAR 

X\. kJ LJi. 1-1— 1-ii.i.V 


1 77 R 
x / / _> 


XV O Ux\±J o XV 


1 ft77 
1 O / / 


P C 'n'MT CP 1 Q 1 Q 

KoiJiMijoxv xy /y 


Or A ~7 




1 77fi 

X / / D 


xVQ.Ux1±joxv 


1 ft 7 ft 
lo / o 


nPPUT CD 1 QQA 

yKvjxlijbK xyoU 


i r\ n n 
10 0 0 




1 777 
i / / / 


P CHAT ,AP 
XV 0 1^/ixxrt.xv 


1 ft 7 Q 
l o / y 


P C TiMP T A 1 Q O 1 
KbUlMxvXA XyoX 


Iz . b 


XV kJ> J-^J-NJ XJ X XV 


1 77ft 

X / / O 




i ft ft n 
loou 


PCPllSTT TO 1 QQO 
xvbUISJXjXy L^OZ 


1 c 
Xb 


ORGALAS 


1779 


XV O J_/XX XJ O XV 


1 ftftl 

1 O O 1 


iv o ij x j_i f\\j x y o j> 


z y 


ORfiAT.A^ 

V^ XV VJ.iT. XXrt. O 


1 7ft0 

X / O \J 


PCRUT.CP 
xv o un xj o xv 


1 ftftQ 
1 O O Z 


PCFlTT TO 1 QO/I 

Kbuiijiy xyb4 


10 


OR^ATiA9 

V^XV 0_a__I.T1ll_) 


1 7ft1 

X / O X 


P CTYHT.CP 
xv O L/xl Xj o xv 


1 ft ft 7 
lOOJ 


KbjjiXjjxy xybo 


o o o 
o . i j 


OP C AT, A C 


1 7QO 
1 / O __ 


p onuT CP 


1 Q Q A 


KbUlLlQ 1986 


12 . 5 


PlQfiCT.TP 
^DuO J_i 1 xv 


1 7Q*J. 
X / C> _> 


C\ C ^"NTT A ID 


T Q O C 

Xo o b 


KbDALRE 198 7 


2 . 07 


p CrVPT.TT 
xVOUxlxj X X 


1 7 Qzl 
X / O ft 


AQQAT PT 


1 Q Q C 

Xo o b 


UbCjALRE 198 8 


*s f\ f\ r\ 

2000 


U O V7l_J_l X XV 


X / o O 


LJKblNijx_K 


1 Q Q ~7 

loo / 


RCjDALTS 198 9 


112.3 


DP CT-TT.TP 
U XV O xl J_i X iv 


1 7ft£ 
1 / O O 


PlC CNTT T"D 
UbblNli IK 


1 Q Q Q 

Xo O O 


RCjDALTS 1990 


11.33 


IV O lv£_ J_l X XV 


1 7ft7 
X / o / 


> nCAT-TT AP 
Vbi_xlxxft.xv 


x o o y 


"D C "PiTVTT A "D 1 Q Q T 

KbDiNijAK iyyi 


52 


R^DNT.AP 

XV O XV IN J_Lrt.IV 


1 7ftft 
1 / oo 


PCTTNTT AP 
xv o UIM J_lH.iv 


i q q n 
x o y u 


TVT"DCTJT A TD *1 OCiO 

IMKbiiijAK LyyZ 


7 


RCHHT.AP 

X\-k_> 1 11 1J_LT_.IV 


1 7ft Q 

X / O -/ 


PCHAT CP 

IV o D1\±J o XV 


1 ft Q 1 
lo^l 


P C TTNTT C "D T QQO 

KbUXMLibK Xyyj 


3 1 


\J O vD LJ J—l XV XV 


1 7QH 
x / y u 


pqnDT HP 

XV O iJ XL Xj y Xv 


1 Q Q 7 

x o y z 


KbDiiliKJii iy94 


14 . 5 


QSSDLRR 


1791 


RADTLRR 


1893 


RSDHLRE 1995 


16.5 


DSGHLTR 


1792 


. DSSNLTR 


1894 


RGDALTS 1996 


37 


DRSHLTR 


1793 


DRSNLER 


1895 


RGDALTS 1997 


148.5 


QSSDLQR 


1794 


RSDALSR 


1896 


RSDERKR 1998 


6 
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762 


GCGGTGGCTC 16 93 


n^/ k_> k_) Xj XJ ^±V 


1 7QR 


PQT>Z\T CP 


1 ft Q 7 
X o y 1 


KbDiXjJ\J\. Lyyy 


18 


776 


ATGGACGGGT 1694 


PSDHLAR 

iv kj J-/ ix xxnxv 


1 796 


Iv O IN Xj XI Xv 


1 ft Qft 
X o y o 


dcfict \ir\ o a a a 
KoXJoXjJNU zUUU 


A A 

0 . 4 


777 


ATGGACGGGT 1695 

-t Ai J- VJ VJx A. V^ VJ VJ VJ J_ J~ Vj _/ 


PSDHT.Aft 

1\- k_J i-J i. ±XXfi.±V 


1 797 

J- / -7 / 


DP QMT.TP 
xJJXolNlxj X iv 


1 ft Q Q 
x o y y 


DCH7VT OA O A A 1 


3 . 4 


779 


CGGGGAGCAG 16 96 


yuvjuu x ja. 


1 7Qfl 
x / y o 




1 QAA 


DCTlUT 7\ Tji D A A O 

KoUrlixA.ilj zUUz 


A C 

0 . 5 


780 


CGGGGAGCAG 16 97 


yuvJuLI X IA. 


1 7QQ 
x / y y 


nQfTPT TP 
^OvjTIXj X K 


X y U X 


P CHUT TD 7\ H A A *D 


a r~ 

0 . 5 


781 


GGGGAGCAGC 1 6 Q ft 


DC qiSTT.PP 
rvO OxM xjivlli 


i ft n n 

X O U U 


P QTUvTT ZiP 
iv o Ul\j xxfirv 


1 y \)£ 


KbJJHLlR 2 0 04 


4.25 


783 


TTGGGAGCGG 16 99 


iv 0 1^ J_i l_l ± i\. 


i pni 

X O U X 






KCjUAL lb 2 0 05 


2000 


785 


TTGGGAGCGG 17 0 0 

■J- x vj vj vjnvj V^ VJ VJ X / W V 


tVuL/ X XJIVIV 


1 ftfl n 

xO UZ 




1 OC\A 
±y\J £ ± 


"D CP\7\ T i"P O O A A f 

KbUAijlb 2 0 06 


50 


786 


-L I VjVJVJfl\JC\JvJ X / VJ X 


rvoxJ ± xjiv.lv 


i ft m 


AQPUT OP 


1 Q A C 

1 y Ub 


RODALRS 2 007 


2000 


787 


AGGGAGGATfi 17 09 


\l O xvlN xxrilv 


1 ft C\A 


P C "Pi "NTT A "D 

Kb DIM XxAK 


T Q A C 

1 y Uo 


KbDHLTQ 2 008 


4 


826 


GAGGGAGCGA 1703 


RSDELTR 


1805 


QSGHLAR 


1907 


RSDNLAR 2 009 


2 .75 


827 


GAGGGAGCGA 1704 


RADTLRR 


1806 


QSGHLAR 


1908 


RSDNLAR 2 010 


1.2 


882 


GCGTGGGCGT 1705 


RSDELTR 


1807 


RSDHLTT 


1909 


RSDERKR 2 011 


0.01 


883 


GCGTGGGCGT 1706 


RSDELTR 


1808 


RSDHLTT 


1910 


RSDERKR 2 012 


1 
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TABLE 5 



SBS# TARGET 



903 
904 
905 
908 
909 
910 
911 
912 
914 
915 
916 
919 
920 
921 
922 
923 
926 
927 
928 
929 
931 
932 
933 
934 
935 
937 
938 
939 
940 
941 
942 
943 
944 
94 5 
946 
947 
948 
962 
963 
964 



ATGGAAGGG 
AAGGGTGAC 
GTGGTGGTG 
AAGGTCTCA 
GTGGAAGAA 
ATGGAAGAT 
ATGGGTGCA 
TCAGAGGTG 
CAGGAAAAG 
CAGGAAAAG 
GAGGAAGGA 
TCATAGTAG 
GATGTGGTA 
AAGGTCTCA 
AAGGTCTCA 
AAGGTCTCA 
GTGGTGGTG 
CAGGTTGAG 
CAGGTTGAG 
CAGGTAGAT 
GAGGAAGAG 
ATGGAAGGG 
GACGAGGAA 
ATGGAAGAT 
ATGGGTGCA 
GTGGGGGCT 
GTGGGGGCT 
GGGGGCTGG 
GGGGGCTGG 
GGGGCTGGG 
GGGGCTGGG 
GGGGCTGGG 
GGGGCTGGG 
GGTGCGGTG 
GGTGCGGTG 
GGTGCGGTG 
GGTGCGGTG 
GAGGCGGCA 
GAGGCGGCA 
GCGGCGGTG 



SEP 
ID 

2013 
2014 
2015 
2016 
2017 
2018 
2019 
2020 
2021 
2022 
2023 
2024 
2025 
2026 
2027 
2028 
2029 
2030 
2031 
2032 
2033 
2034 
2035 
2036 
2037 
2038 
2039 
2040 
2041 
2042 
2043 
2044 
2045 
2046 
2047 
2048 
2049 
2050 
2051 
2052 



Fl 



SEQ 
ID 



F2 



SEQ 
ID 



F3 



SEP 
ID 



RSDHLAR 2513 
DSSNLTR 2514 
RSSALTR 2515 
QSGDLRT 2516 
QSGNLSR 2517 
QSSNLAR2518 
QSGSLTR 2519 
RSDSLAR 252 0 
RSDNLTQ 2521 
RSDNLRQ 2522 
QSGHLAR 2523 
RSDNLTT 2524 
QSSSLVR 2525 
QSGDLRT 252 6 
QSHDLTK 2527 
QSHDLTK 2528 
RSDALTR 252 9 
RSDNLAR 253 0 
RSDNLAR 2531 
QSSNLAR 2532 
RSDNLAR 2533 
RSDHLAR 2534 
QSANLAR 253 5 
QSSNLAR 253 6 
QSGSLTR 253 7 
QSSDLTR 253 8 
QSSDLRR 253 9 
RSDHLTT 254 0 
RSDHLTK 2541 
RSDHLAR 2542 
RSDHLAR 2543 
RSDHLAR 2544 
RSDHLAR 2545 
RSDSLTR 254 6 
RSDSLTR 254 7 
RSDSLTR 2 54 8 
RSDSLTR 254 9 
QSGSLTR 2550 
QSGSLTR 2551 
RSDALAR 2552 



QSGNLAR 3 013 
QSSHLAR 3 014 
RSDSLAR 3 015 
DRSALAR 3 016 
QSGNLQR 3 017 
QSGNLQR 3 018 
QSSHLAR 3 019 
RSDNLTR 3 020 
QSGNLAR 3021 
QSGNLAR 3 022 
QSGNLAR 3 023 
RSDNLRT 3 024 
RSDSLAR 3 025 
DPGALVR 3 026 
DRSALAR 3 027 
DPGALVR 3 028 
RSDSLAR 3 029 
TSGSLTR 3 030 
QSSALTR 3 031 
QSATLAR 3 032 
QSSNLVR 3 033 
QSSNLVR 3 034 
RSDNLAR 3 035 
QSGNLQR 3 036 
QSSHLAR 3 037 
RSDHLTR 3 038 
RSDHLTR 3 039 
DRSHLAR 3 040 
DRSHLAR 3 041 
QSSDLRR 3 042 
QSSDLRR 3 043 
TSGELVR 3 044 
TSGELVR 3 045 
RADTLRR 3 046 
RSDVLQR 3 04 7 
RSDELQR 3 048 
RSDVLQR 3 04 9 
RSDELQR 3 050 
RSDDLQR 3 051 
RSDELQR 3 052 



RSDALRQ 3513 
RSDNLTQ 3514 
RSDSLAR 3515 
RSDNLRQ 3516 
RSDALAR 3517 
RSDALAQ 3518 
RSDALAQ 3519 
QSGDLRT 3520 
RSDNLRE 3521 
RSDNLRE 3522 
RSDNLQR 3523 
QSGDLRT 3524 
TSANLSR 3525 
RSDNLRQ 3526 
RSDNLRQ 3527 
RSDNLRQ 3528 
RSDSLAR 3529 
RSDNLRE 3530 
RSDNLRE 3531 
RSDNLRE 3532 
RSDNLAR 3533 
RSDALRQ 3534 
DRSNLTR 3535 
RSDALTS 3536 
RSDALTS 3537 
RSDSLAR 3538 
RSDSLAR 3539 
RSDHLSK 3540 
RSDHLSK 3541 
RSDKLSR 3542 
RSDHLSK 3543 
RSDKLSR 3544 
RSDHLSK 3545 
MSHHLSR 3546 
MSHHLSR 3547 
QSSHLAR 3548 
QSSHLAR 3549 
RSDNLAR 3550 
RSDNLAR 3551 
RSDERKR 3552 



Kd 
(nM) 

1 . 027 
1 

1.15 

50 
16.4 
0 .03 
0.91 
0 . 135 
1.26 
45.15 
1.3 
250 
4 

11 
4 
2 

7 . 502 
3 .61 
25 
1.3 

2 
797 
500 
0 . 07 
0 . 91 
0 .03 
0 . 049 
0.352 

1.5 
0 . 077 
0 . 13 
0 . 067 
0 . 027 
0 . 027 
0 . 027 
0 . 013 
0 . 017 
0 . 015 
0.015 
0 . 041 
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965 
966 
967 
968 
969 
970 
971 
972 
973 
974 
975 
976 
977 
978 
979 
980 
981 
982 
983 
984 
985 
986 
987 
988 
989 
990 
991 
993 
994 
995 
996 
997 
998 
999 
1000 
1001 
1002 
1003 
1004 
1006 
1007 
1008 
1009 
1010 
1011 
1012 
1013 



GCGGCGGCC 
GAGGAGGCC 
GAGGAGGCC 
GAGGCCGCA 
GAGGCCGCA 
GTGGGCGCC 
GTGGGCGCC 
GTGGGCGCC 
GCCGCGGTC 
GCCGCGGTC 
CAGGCCGCT 
CAGGCCGCT 
CTGGCAGTG 
CTGGCAGTG 
CTGGCGGCG 
CTGGCGGCG 
CAGGCGGCG 
CCGGGCTGG 
CCGGGCTGG 
GACGGCGAG 
GACGGCGAG 
GGTGCTGAT 
GGTGCTGAT 
GGTGCTGAT 
GGTGAGGGG 
AAGGTGGGC 
AAGGTGGGC 
GGGGCTGGG 
GGGGGCTGG 
GGGGAGGAA 
CAGTTGGTC 
AGAGAGGCT 
ACGTAGTAG 
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RSDSLTQ 3 095 
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RSDSLAR 3 097 
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1173 GCTGAAGGG 2241 RSDHLSR 2741 QSGNLAR 3241 QSSDLRR 3741 0.008 

1174 GCTGAAGGG 2242 RSDHLSR 2742 QSSNLVR 3242 QSSDLRR 3742 0.018 

1175 AAGGTCGCC 2243 DRSDLTR 2743 DPGALVR 3243 RSDNLTQ 3743 8.9 

1176 GTGGGAGCC 2244 DRSDLTR 2744 QRAHLER 3244 RSDALTR 3744 4.1 

1177 CCGGGCGCA 2245 QSGSLTR 2745 DRSHLAR 3245 RSDTLRE 3745 4.1 

1178 GAGGATGGC 2246 DRSHLAR 2746 TSGNLVR 3246 RSDNLAR 3746 0.085 

1179 GCAGCGCAG 2247 RSSNLRE 274 7 RSSDLTR 3247 QSGSLTR 3747 2.735 

1180 AAGGAAAGA 2248 QSGHLNQ 274 8 QSGNLAR 3248 RSDNLTQ 3748 4.825 

1181 TTGGCTATG 2249 RSDALRQ 274 9 TSGELVR 3249 RGDALTS 3749 8.2 

1182 CAGGAAGGC 2250 DRSHLAR 2750 QSGNLAR 3250 RSDNLRE 3750 1.48 

1183 CAGGAAGGC 2251 DRSHLAR 2751 QSSNLVR 3251 RSDNLRE 3751 1.935 

1184 AAGGAAAGA 2252 KNWKLQA 2752 QSGNLAR 3252 RSDNLTQ 3752 2.785 

1185 AAGGAAAGA 2253 KNWKLQA 2753 QSHNLAR 3253 RSDNLTQ 3753 5.25 

1186 GCCGAGGTG 2254 RSDSLLR 2754 RSKNLQR 3254 ERGTLAR 3754 27.5 

1187 CTGGTGGGC 2255 DRSHLAR 2755 RSDALTR 3255 RSDALRE 3755 0.006 

1188 GTAGTATGG 2256 RSDHLTT 2756 QSSSLVR 3256 QRASLAR 3756 2.74 

1189 ATGGTTGAA 2257 QSANLAR 2757 TSGALTR 3257 RSDALRQ 3757 1.51 

1190 ATGGCAGTG 2258 RSDALTR 2758 QSGDLTR 3258 RSDSLNQ 3758 1.484 

1191 ATGGCAGTG 2259 RSDALTR 2759 QSGSLTR 3259 RSDSLNQ 3759 5.325 

1192 ATGGCAGTG 2260 RSDALTR 2760 QSGDLTR 3260 RSDALTQ 3760 2.364 

1193 ATGGCAGTG 2261 RSDALTR 2761 QSGSLTR 32 61 RSDALTQ 3761 3.125 

1194 GAGAAGGTG 2262 RSDALTR 2762 RSDNRTA 3262 RSDNLTR 3762 2.19 

1195 GAGAAGGTG 2263 RSDALTR 2763 RSDNRTA 3263 RSSNLTR 3763 2.8 
1197 GAAGGTGCC 2264 E RGDLTR 2764 MSHHLSR 3264 QSGNLTR 3764 14.8 
1199 ATGGAGAAG 2265 RSDNRTA 2765 RSDNLTR 3265 RSDALTQ 3765 3.428 
12 00 ATGGAGAAG 2266 RSDNRTA 2766 RSSNLTR 32 66 RSDALTQ 3766 16.87 
12 01 ATGGAGAAG 2267 RSDNRTA 2767 RSHNLTR 32 67 RSDALTQ 3767 14.8 

1202 CTGGAGTAC 2268 DRSNLRT 2768 RSDNLTR 3268 RSDALRE 3768 2.834 

1203 GGAGTACTG 2269 RSDALRE 2769 QRSALAR 32 69 QRAHLAR 3769 2.945 

1204 GGAGTACTG 2270 RSDALRE 2770 QSSSLVR 3270 QRAHLAR 3770 4.38 

1205 CGGGCAGCT 2271 QSSDLRR 2771 QSGDLTR 3271 RSDHLRE 3771 0.9 

1206 GCGGGAGTT 2272 TTSALTR 2772 QRAHLER 3272 RSDERKR 3772 0.034 

1207 CAGGCTGGA 2273 QRAHLER 2773 TSGELVR 3273 RSDNLRE 3773 0.45 
1209 CCGGAAGCG 2274 RSDELTR 2774 QSSNLVR 3274 RSDTLRE 3774 19.28 

1211 GCAGCGCAG 2275 RSDNLRE 2775 RSDELTR 32 75 QSGSLTR 3775 6.5 

1212 CAGGGGGTT 2276 TTSALTR 2776 RSDHLTR 3276 RSDNLRE 3776 0.05 

1213 GAAGAAGAG 2277 RSDNLTR 2777 QSSNLVR 3277 QSGNLTR 3777 12.3 

1214 ATGGGAGTT 2278 TTSALTR 2778 QRAHLER 3278 RSDALTQ 3778 0.46 

1215 GTGGGGGCT 2279 QSSDLRR 2779 RSDHLTR 3279 RSDALTR 3779 0.003 

1217 GAAGAGGCA 2280 QSGSLTR 2780 RSDNLTR 3280 QSANLTR 3780 0.004 

1218 GCGGTGAGG 2281 RSDHLTQ 2781 RSQALTR 3281 RSDERKR 3781 0.46 

1219 AAGGAAAGG 2282 RSDHLTQ 2782 QSHNLAR 3282 RSDNLTQ 3782 0.68 
122 0 AAGGAAAGG 2283 RSDHLTQ 2783 QSGNLAR 3283 RSDNLTQ 3783 0.175 

1221 AAGGAAAGG 2284 RSDHLTQ 2 784 QSSNLVR 32 84 RSDNLTQ 3784 1.4 

1222 CAGGAGGGC 2285 DRSHLAR 2785 RSDNLAR 3285 RSDNLRE 3785 0.155 

1223 ATGGACTTG 2286 RSDALTK 2786 DRSNLTR 3286 RSDALTQ 3786 7 

1224 ATGGACTTG 2287 RADALMV 2787 DRSNLTR 3287 RSDALTQ 3787 12 
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2853 


RSDNLAR 


3353 


RSDNLTT 


3853 


0 . 


088 


1306 


CTGGCCTTG 


2354 


RSDALTT 


2854 


DCRDLAR 


3354 


RSDALRE 


3854 


2 . 


285 


1308 


TGGGCAGCC 


2355 


ERGTLAR 


2855 


QSGSLTR 


3355 


RSDHLTT 


3855 


0 . 


305 


1309 


TAGGAGTTT 


2356 


QSSALAS 


2856 


RSDNLAR 


3356 


RSDNLTT 


3856 


0 . 


184 


1310 


TAGGAGTTT 


2357 


TTSALAS 


2857 


RSDNLAR 


3357 


RSDNLTT 


3857 


0 . 


075 


1311 


TGGGCAGCC 


2358 


ERGDLAR 


2858 


QSGSLTR 


3358 


RSDHLTT 


3858 


o 


. 91 


1312 


GGGGCGTGA 


2359 


QSGHLTK 


2859 


RSDELQR 


3359 


RSDHLSR 


3859 


o 


. 23 


1313 


GGGGCGTGA 


2360 


QSGHLTT 


2860 


RSDELQR 


3360 


RSDHLSR 


3860 


0 


. 09 


1314 


GTACAGTAG 


2361 


RSDNLTT 


2861 


RSDNLRE 


3361 


QSSSLVR 


3861 


3 


. 09 


1315 


GTACAGTAG 


2362 


RSDNLTT 


2862 


RSDNLTE 


3362 


QSSSLVR 


3862 


9 


. 27 


1318 


ATGGTGTGT 


2363 


TSSHLAS 


2863 


RSDALAR 


3363 


RSDALAQ 


3863 


0 . 


048 


1319 


ATGGTGTGT 


2364 


MSHHLTT 


2864 


RSDALAR 


3364 


RSDALAQ 


3864 


0 . 


228 


1320 


TTGGGAGAG 


2365 


RSDNLAR 


2865 


QRAHLER 


3365 


RSDALTT 


3865 


0 . 


044 


1321 


TTGGGAGAG 


2366 


RSDNLAR 


2866 


QRAHLER 


3366 


RADALMV 


3866 


0 . 


127 


1322 


GTGGGAATA 


2367 


QSGALTQ 


2867 


QSGHLTR 


3367 


RSDALTR 


3867 


0 . 


799 


1323 


GTGGGAATA 


2368 


QLTGLNQ 


2868 


QSGHLTR 


3368 


RSDALTR 


3868 


0 . 


744 


1324 


GTGGGAATA 


2369 


QQASLNA 


2869 


QSHHLTR 


3369 


RSDALTR 


3869 


18 


.52 


1325 


TTGGTTGGT 


2370 


TSGHLVR 


2870 


TSGSLTR 


3370 


RSDALTK 


3870 


0. 


306 


1326 


TTGGTTGGT 


2371 


TSGHLVR 


2871 


QSSALTR 


3371 


RSDALTK 


3871 


4 . 


385 


1327 


TTGGTTGGT 


2372 


TSGHLVR 


2872 


TSGSLTR 


3372 


RSDALTT 


3872 


0 . 


566 


1328 


TTGGTTGGT 


2373 


TSGHLVR 


2873 


QSSALTR 


3373 


RSDALTT 


3873 


7 


. 95 


1329 


CTGGCCTGG 


2374 


RSDHLTT 


2874 


DRSDLTR 


3374 


RSDALRE 


3874 


0 


.68 


1330 


GAGGTGTGA 


2375 


QSGHLTT 


2875 


RSDALTR 


3375 


RSDNLAR 


3875 


0 . 


175 


1331 


CTGGCCTGG 


2376 


RSDHLTT 


2876 


DCRDLAR 


3376 


RSDALRE 


3876 


0 . 


388 


1334 


CCGGCGCTG 


2377 


RSDALRE 


2877 


RSSDLTR 


3377 


RSDDLRE 


3877 


0 


.31 


1335 


GACGCTGGC 


2378 


DRSHLTR 


2878 


QSSDLTR 


3378 


DSSNLTR 


3878 


1 


.4 


1336 


CGGGCTGGA 


2379 


QSGHLAR 


2879 


QSSDLTR 


3379 


RSDHLAE 


3879 


1 


.4 


1337 


CGGGCTGGA 


2380 


QSSHLAR 


2880 


QSSDLTR 


3380 


RSDHLAE 


3880 


0. 


235 


1338 


GGGATGGCG 


2381 


RSDELTR 


2881 


RSDALTQ 


3381 


RSDHLSR 


3881 


1 


. 04 
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1339 GGGATGGCG 2382 RSDELTR 2882 RSDSLTQ 3382 RSDHLSR 3882 0.569 

1340 GGGATGGCG 2383 RSDELTR 2883 RSDALTQ 3383 RSHHLSR 3883 0.751 

1341 GGGATGGCG 2384 RSDELTR 2884 RSDSLTQ 3384 RSHHLSR 3884 4.1 

1342 CAGGCGCAG 2385 RSDNLRE 2885 RSSDLTR 3385 RSDNLTE 3885 0.68 

1343 CAGGCGCAG 2386 RSDNLTT 2886 RTSTLTR 3386 RSDNLTE 3886 37.04 

1344 CCGGGCGAC 2387 DRSNLTR 2887 DRSHLAR 3387 RSDTLRE 3887 2.28 

1346 GATGTGTGA 2388 QSGHLTT 2888 RSDALAR 3388 TSANLSR 3888 0.153 

1347 CAGTGAATG 2389 RSDALTS 2889 QSHHLTT 3389 RSDNLTE 3889 8.23 

1348 GGGTCACTG 2390 RSDALTA 2890 QAATLTT 3390 RSDHLSR 3890 2.58 

1350 CAGTGAATG 2391 RSDALTQ 2891 QSGHLTT 3391 RSDNLTE 3891 74.1 

1351 GGGTCACTG 2392 RSDALRE 2892 QSHDLTK 3392 RSDHLSR 3892 0.234 

1352 GTGTGGGTC 2393 DRSALAR 28 93 RSDHLTT 33 93 RSDALTR 3893 0.023 

1353 CTGGCGAGA 2394 QSGHLNQ 2894 RSDELQR 33 94 RSDALRE 3894 56.53 

1354 CTGGCGAGA 2395 KNWKLQA 2895 RSDELQR 3395 RSDALRE 3895 20.85 

1355 GCTTTGGCA 2396 QSGSLTR 2896 RSDALTT 33 96 QSSDLTR 3896 0.172 

1356 GCTTTGGCA 2397 QSGSLTR 2897 RADALMV 33 97 QSSDLTR 3897 0.034 
13 57 GACTTGGTA 2398 QSSSLVR 2898 RSDALTT 33 98 DRSNLTR 3898 0.032 
1358 GACTTGGTA 2399 QSSSLVR 2899 RADALMV 3399 DRSNLTR 3899 0.05 
1360 CAGTTGTGA 2400 QSGHLTT 2900 RADALMV 3400 RSDNLTE 3900 41.7 
13 61 AAGGAAAAA 2401 QKTNLDT 2901 QSGNLQR 34 01 RSDNLTQ 3 901 0.835 
1362 AAGGAAAAA 2402 QSGNLNQ 2902 QSGNLQR 3402 RSDNLTQ 3902 0.332 
13 63 AAGGAAAAA 2403 QKTNLDT 2903 QRSNLVR 34 03 RSDNLTQ 3903 74.1 
13 64 ATGGGTGAA 2404 QSANLSR 2904 QSSHLAR 34 04 RSDALAQ 3904 1.22 
1365 ATGGGTGAA 2405 QRSNLVR 2 905 QSSHLAR 3405 RSDALAQ 3905 0.152 
13 66 ATGGGTGAA 2406 QSANLSR 2 906 TSGHLVR 3406 RSDALAQ 3906 22.63 
13 67 ATGGGTGAA 2407 QRSNLVR 2907 TSGHLVR 3407 RSDALAQ 3907 1.028 

1368 CTGGGAGAT 2408 QSSNLAR 2908 QRAHLER 3408 RSDALRE 3908 0.051 

1369 CTGGGAGAT 2409 QSSNLAR 2909 QSGHLTR 34 09 RSDALRE 3909 0.227 

1373 GTGGTGGGC 2410 DRSHLTR 2910 RSDALSR 3410 RSDALTR 3910 0.025 

1374 CCGGCGGTG 2411 RSDALTR 2911 RSDELQR 3411 RSDELRE 3911 0.003 

1375 CCGGCGGTG 2412 RSDALTR 2912 RSDDLQR 3412 RSDELRE 3912 0.008 

1376 CCGGCGGTG 2413 RSDALTR 2913 RSDERKR 3413 RSDELRE 3913 0.858 

1377 CCGGCGGTG 2414 RSDALTR 2914 RSDELQR 3414 RSDDLRE 3 914 0.012 

1378 CCGGCGGTG 2415 RSDALTR 2915 RSDDLQR 3415 RSDDLRE 3915 0.012 

1379 CCGGCGGTG 2416 RSDALTR 2916 RSDERKR 3416 RSDDLRE 3916 0.25 

1380 GCCGACGGT 2417 QSSHLTR 2917 DRSNLTR 3417 ERGDLTR 3917 0.076 

1381 GCCGACGGT 2418 QSSHLTR 2918 DPGNLVR 3418 ERGDLTR 3918 0.23 
13 82 GCCGACGGT 2419 QSSHLTR 2919 DRSNLTR 3419 DCRDLAR 3919 3.1 

1383 GCCGACGGT 2420 QSSHLTR 2920 DPGNLVR 3420 DCRDLAR 3920 1.74 

1384 GGTGTGGGC 2421 DRSHLTR 2921 RSDALSR 3421 MSHHLSR 3921 0.013 

1385 TGGGCAAGA 2422 QSGHLNQ 2922 QSGSLTR 3422 RSDHLTT 3922 0.229 

1386 TGGGCAAGA 2423 ENWKLQA 2923 QSGSLTR 3423 RSDHLTT 3923 0.193 
13 8 9 CTGGCCTGG 2424 RSDHLTT 2924 DCRDLAR 3424 RSDALRE 3924 0.175 

1393 TGGGAAGCT 2425 QSSDLRR 2925 QSGNLAR 3425 RSDHLTT 3925 0.1 

1394 TGGGAAGCT 2426 QSSDLRR 2926 QSGNLAR 3426 RSDHLTK 3926 0.04 

1395 GAAGAGGGA 2427 QSGHLQR 292 7 RSDNLAR 3427 QSGNLAR 3927 0.025 

1396 GAAGAGGGA 2428 QRAHLAR 292 8 RSDNLAR 3428 QSGNLAR 3928 0.107 
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13 97 GAAGAGGGA 2429 QSSHLAR 2929 RSDNLAR 3429 QSGNLAR 3929 0.14 
1398 TAATGGGGG 2430 RSDHLSR 2930 RSDHLTT 3430 QSGNLRT 3930 0.065 

13 99 TGGGAGTGT 2431 TKQHLKT 2 931 RSDNLAR 3431 RSDHLTT 3931 0.1 

14 00 CCGGGTGAG 2432 RSDNLAR 2932 QSSHLAR 3432 RSDDLRE 3932 0.371 

1401 GAGTTGGCC 2433 ERGTLAR 2933 RADALMV 3433 RSDNLAR 3933 0.167 

1402 CTGGAGTTG 2434 RGDALTS 2934 RSDNLAR 3434 RSDALRE 3 934 0.15 
14 03 ATGGCAATG 2435 RSDALTQ 2 935 QSGSLTR 3435 RSDALTQ 3935 0.07 
1404 GAGGCAGGG 2436 RSDHLSR 2936 QSGSLTR 3436 RSDNLAR 3936 0.022 
14 05 GAGGCAGGG 2437 RSDHLSR 2 937 QSGDLTR 3437 RSDNLAR 3937 0.045 
14 06 GAAGCGGAG 2438 RSDNLAR 2938 RSDELTR 3438 QSGNLAR 3938 0.025 
1407 GCGGGCGCA 2439 QSGSLTR 2939 DRSHLAR 3439 RSDERKR 3 939 0.585 
14 08 CCGGCAGGG 2440 RSDHLSR 2940 QSGSLTR 3440 RSDELRE 394 0 0.305 

1409 CCGGCAGGG 2441 RSDHLSR 2941 QSGSLTR 3441 RSDDLRE 3941 0.153 

1410 CCGGCGGCG 2442 RSDELTR 2942 RSDELQR 3442 RSDELRE 3942 0.814 

1411 TGAGGCGAG 2443 RSDNLAR 2943 DRSHLAR 3443 QSGHLTK 3943 0.282 

1412 CTGGCCGTG 2444 RSDSLLR 2944 ERGTLAR 3444 RSDALRE 3944 0.172 

1413 CTGGCCGCG 2445 RSDELTR 2945 DRSDLTR 3445 RSDALRE 3945 0.152 

1414 CTGGCCGCG 2446 RSDELTR 2946 ERGTLAR 3446 RSDALRE 3946 0.914 

1415 GCGGCCGAG 2447 RSDNLAR 2947 DRSDLTR 3447 RSDELQR 3947 0.102 

1416 GCGGCCGAG 2448 RSDNLAR 2948 ERGTLAR 3448 RSDELQR 3948 0.153 

1417 GAGTTGGCC 2449 ERGTLAR 294 9 RGDALTS 3449 RSDNLAR 3949 1.397 

1418 CTGGAGTTG 2450 RADALMV 2950 RSDNLAR 3450 RSDALRE 3950 0.241 

1422 GGGTCGGCG 2451 RSDELTR 2951 RSDDLTT 3451 RSDHLSR 3951 0.064 

1423 GGGTCGGCG 2452 RSDELTR 2952 RSDDLTK 3452 RSDHLSR 3952 0.034 

1424 CAGGGCCCG 2453 RSDELRE 2953 DRSHLAR 3453 RSDNLRE 3953 1.37 

1427 CAGGGCCCG 2454 RSDDLRE 2954 DRSHLAR 3454 RSDNLTE 3954 0.271 

1428 TGAGGCGAG 2455 RSDNLAR 2955 DRSHLAR 3455 QSVHLQS 3955 0.102 

1429 TGAGGCGAG 2456 RSDNLAR 2956 DRSHLAR 3456 QSGHLTT 3 956 0.074 

1430 TCGGCCGCC 2457 ERGTLAR 2957 DRSDLTR 3457 RSDDLTK 3957 0.352 

1431 TCGGCCGCC 2458 ERGTLAR 2 958 DRSDLTR 34 58 RSDDLAS 3958 6.17 

1432 TCGGCCGCC 2459 ERGTLAR 2959 ERGTLAR 3459 RSDDLTK 3959 1.778 

1434 CTGGCCGTG 2460 RSDSLLR 2960 DRSDLTR 3460 RSDALRE 3960 0.051 

1435 TAATGGGGG 2461 RSDHLSR 2961 RSDHLTT 3461 QSGNLTK 3961 0.057 

1436 TGGGAGTGT 2462 TSDHLAS 2962 RSDNLAR 3462 RSDHLTT 3962 0.026 

1439 GGAGTGTTA 2463 QRS ALAS 2963 RSDALAR 3463 QSGHLQR 3963 0.075 

1440 GGAGTGTTA 2464 QSGALTK 2964 RSDALAR 3464 QSGHLQR 3964 0.035 

1441 ATAGCTGGG 2465 RSDHLSR 2965 QSSDLTR 3465 QSGALTQ 3965 0.262 

1442 TGCTGGGCC 2466 ERGTLAR 2966 RSDHLTT 3466 DRSHLTK 3 966 0.36 

1443 TGGAAGGAA 2467 QSGNLAR 2967 RSDNLTQ 3467 RSHHLTT 3967 0.22 

1444 TGGAAGGAA 2468 QSGNLAR 2968 RSDNLTQ 3468 RSSHLTT 3968 0.09 

1445 TGGAAGGAA 2469 QSGNLAR 2969 RLDNLTA 3469 RSHHLTT 3969 0.182 

1446 TGGAAGGAA 2470 QSGNLAR 2970 RLDNLTA 3470 RSSHLTT 3970 0.42 

1454 GGAGAGGCT 2471 QSSDLRR 2971 RSDNLAR 3471 QSGHLQR 3971 0.01 

1455 CGGGATGAA 2472 QSANLSR 2972 TSGNLVR 3472 RSDHLRE 3972 0.043 

1456 GGAGAGGCT 2473 QSSDLRR 2973 RSDNLAR 3473 QRAHLAR 3 973 0.016 

1457 GCAGAGGAA 2474 QSANLSR 2974 RSDNLAR 3474 QSGSLTR 3974 0.014 
1460 TTGGGGGAG 2475 RSDNLAR 2975 RSDHLTR 3475 RADALMV 3975 0.007 
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1461 


GACGAGGAG 


2476 




1469 


PGGGATGAA 


2477 




1463 

X T O «J 


GAGGPTGTT 

VJflVJvJV- J- W X X 


P478 




It u i 


GAPGAGGAG 


9479 




_L 1 D D 


V- X w w wraw X -L 


94 ft 0 




1 4 66 

IIOD 


PTGGGAGTT 


94 ft 1 




1 4 £ fi 


ww X \jr\ lulL 


94 ft 9 




1 4 6 Q 

X *± D -7 


GGTGATGTP 

w w X uri X w X w 


94 ft 3 




14 70 


ww X wtt. X w X w 


94 ft4 




14 71 


ptggttggg 

w X ww X X www 


94 ft R 




1 4 7? 


TTGAAGGTT 


94 ft 6 




1473 


TTGAAGGTT 


2487 




1474 


TTGAAGGTT 

x x urifiuu x x 


2488 

<1j ~ u u 




1 47 R 


TTGAAGGTT 

X X wxtltiLww x x 


2489 




1476 


TTGAAGGTT 

X X WTTJTvW W X X 


2490 
~ ^/ \j 




1477 


GPAGPPPGG 

WW.tt.WW W W WW 


2491 

£u *"X ^ X 




14 79 

X i / J/ 


GAAAGTTPA 


2492 




1 4 ft 0 

x t: o \j 


GAAAGTTPA 


94 93 




1 4ft 1 

-L *± O X 


GAAAGTTPA 

Wtt-tt_tt.W X X V — -Xi. 


94 94 




1 4 ft 9 


ppgtgtgap 

www X w X unL 


94 9 R 




1 4 ft 3 


PPGTGTGAP 

V_ w w X w X unl^ 


94 96 




1 4 ft4 


GAAGTGGTA 

vjrlflvj X ww x tt. 


94 97 




1 4ft R 

X *x O _J 


AAGTGAGPT 


24 98 


1 4ft 6 

X *± o u 


GGGTTTGAP 

xxx unu 


9499 


s 

M 


1 4 ft 7 


TTGAAGGTT 

X X uririvju X X 


9^00 
*j \j \j 




1 4 ft ft 
X 1 o o 


AAGTGGTAG 

ttJ-iw X w w X nu 


9 R 0 1 

^ O w X 




1 4 Q o 


PTGGTTGGG 

w X ww X X www 


9 R09 


o 


1 4 Q 1 

X *I -7 X 


AAGGGTTPA 

ririUUu X X tn 


Z jUj 


.C35. 


1 4 Q9 


/iutlLj X ww X Ho 


9 R 04 




X *± J 


/^rt.w X ww X tt.W 


Tent; 




1 4 94 


GGGTTTG A P 

www XXX wii. w 


9 R0 6 

^ J u o 




1 A Q £T 


1 X w^w^wALj 


Z D U / 




1497 


GAGGCTCTT 


2508 




1498 


GAGGTTGAT 


2509 




1499 


GAGGTTGAT 


2510 




1500 


GCAGAGGAA 


2511 




1522 


GCAATGGGT 


2512 



RSANLAR 2976 
QSGNLAR 2977 
TTSALTR 2 978 
RSDNLAR 2979 
TTSALTR 2980 
NRATLAR 2 981 
DRSALTR 2982 
DRSALTR 2983 
DRSALTR 2984 
RSDHLSR 2985 
TTSALTR 2 986 
TTSALTR 2 987 
QSSALAR 2988 
QSSALAR 2989 
QSSALAR 2990 
RSDHLRE 2 991 
QSHDLTK 2992 
NKTDLGK 2993 
NKTDLGK 2994 
DRSNLTR 2995 
DRSNLTR 2996 
QSSSLVR 2997 
QSSDLRR 2998 
DRSNLTR 2999 
TTSALTR 3000 
QSSDLRR 3001 
RSDHLSR 3002 
NKTDLGK 3003 
RSDNLTT 3 004 
RSDNLTT 3005 
DRSNLTR 3006 
RSDNLAR 3 007 
QSSALAR 3008 
QSSNLAR 3009 
QSSNLAR 3010 
QSGNLAR 3011 
TSGHLVR 3 012 



RSDNLTR 34 76 
TSGNLVR 3477 
QSSDLTR 3478 
RSDNLTR 3479 
QSGHLQR 34 80 
QSGHLQR 3481 
TSGNLVR 3482 
TSGNLVR 3483 
TSGNLVR 3484 
QSSALTR 3485 
RSDNLTQ 34 86 
RSDNLTQ 34 87 
RSDNLTQ 3488 
RSDNLTQ 34 89 
RSDNLTQ 34 90 
DRSDLTR 34 91 
MSHHLTQ 3492 
TSGHLVQ 34 93 
TSDHLAS 34 94 
TSDHLAS 3495 
MSHHLTT 3496 
RSDALSR 34 97 
QSGHLTT 34 98 
TTSALAS 34 99 
RSDNLTQ 3500 
QSGHLTT 3501 
TSGSLTR 3502 
DSSKLSR 3503 
RSDHLTT 3504 
RSDHLTT 3505 
QRSALAS 3506 
RSDHLTR 3507 
QSSDLTR 3508 
QSSALTR 3509 
TSGALTR 3510 
RSDNLAR 3511 
RSDALTQ 3512 



DRSNLTR 3976 0.014 

RSDHLRE 3977 0.05 

RSDNLAR 3978 0.003 

DRSNLTR 3979 0.002 

RSDALRE 3 980 0.018 

RSDALRE 3 981 0.017 

MSHHLSR 3982 0.08 

TSGHLVR 3983 0.28 

QRAHLER 3984 0.156 

RSDALRE 3985 0.09 

RADALMV 3986 3.22 

RSDSLTT 3987 0.47 

RADALMV 3988 1.39 

RLHSLTT 3989 0.39 

RSDSLTT 3990 0.305 

QSGSLTR 3 991 2.31 

QSGNLAR 3992 37.04 

QSGNLAR 3993 62.5 

RSDELRE 3994 37.04 

RSDELRE 3 995 111.1 

RSDELRE 3996 20.8 

QSGNLAR 3997 0.01 

RSDNLTQ 3998 1.537 

RSDHLSR 3 999 0.085 

RLHSLTT 4000 0.188 

RLDNRTQ 4001 5.64 

RSDALRE 4002 0.04 

RLDNRTA 4003 4.12 

RSDNLTQ 4004 1.37 

RLDNRTQ 4 005 15.09 

RSDHLSR 4006 0.255 

RSDALTT 4 007 0.065 

RSDNLAR 4008 0.007 

RSDNLAR 4009 0.101 

RSDNLAR 4010 0.02 

QSGSLTR 4011 0.003 

QSGDLTR 4012 0.08 
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FINGER (N^C) 


TRIPLET (5'-*3') 


Fl 




/ 


AGG 







RXDHXX& 


ATG 






RXDAXXQ 


CGG 






RXDHXXE 


GAA 




QXGNXXR 


/ 


GAC 


DXSNXXR 




/ DXSNXXR 


GAG 


RXDNXXR 


RXSNXXR / 
RXDNXXR/ 


RXDNXXR 


GAT 


QXSNXXR 
TXSNXXR 
TXGNXXR 


TXGNXXRf 
J- 




GCA 


QXGSXXR 


QXGDXXR 






GCC 


EXGTXXR 


/ 




GCG 


RXDEXXR y 


RXDEXXR 


RXDEXXR 
RXDTXXK 


GCT 


qxsdxxb/ 


TXGEXXR 
QXSDXXR 




GGA 




QXGHXXR 


OXAHXXR 


GGC 


DXSHXXR 


DXSHXXR 




GGG 


/RXDHXXR 


RXDHXXR 


RXDHXXR 
RXDHXXK 


GGT , 






TXGHXXR 


GTA / 




QXGSXXR 
QXATXXR 




gtg/ 


RXDAXXR 
RXDSXXR 


RXDAXXR 


RXDAXXR 


TAG 




RXDNXXT 




/ TCG 


RXDDXXK 






/ TGT 




TXDHXXS 
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